Swire Coca-Cola Capstone Modeling - Group 8

Author : Charith Reddy Gopavaram

Table of contents:¶

  1. Introduction
  2. Load libraries and packages
  3. Import dataset and inspect
  4. Missing Value Analysis
    4.1 Imputation of Missing Value's
  5. Univariate Analysis
    5.1 Distribution of actual work in minutes
    5.2 Breakdown counts by production location
    5.3 Breakdown trends over years
    5.4 Number of maintenance activities per month in 2023
    5.5 Frequency of maintenance activity type
  6. Bivariate Analysis
    6.1 Average downtime by maintenance activity type
    6.2 Average downtime by plant
    6.3 Count of Maintenance type by Maintenance activity type
    6.4 Production line Analysis using Functional Area
    6.4.1 Average downtime by Functional Area node 4
    6.4.2 Average downtime by Functional Area node 5
    6.5 Frequency of maintenance type across equipment categories
    6.6 Equipment Age vs Actual work in minutes
  7. Multivariate Analysis
    7.1 Average downtime by maintenance activity type for different plants
    7.2 Maintenance type by equipment age and downtime
  8. Outlier Analysis
  9. Feature Engineering
    9.1 Breakdown analysis by Equipment Type
    9.2 Breakdown Timeline Analysis
    9.3 Machine Location and Granular Analysis
    9.4 Nature of Repairs and Correction Methods
    9.5 Equipment-Specific Time Between Failures
    9.6 Maintenance Frequency and Downtime
    9.7 Time Until Next Downtime Post-Maintenance
    9.8 Seasonal Impact and Cost Analysis
    9.9 Key Features of Modeling
  10. Model Selection
  11. Modeling
    11.1 ARIMA Model
    11.2 Functional Location Segmentation for Process Analysis
    11.3 Kaplan-Meier Model
    11.4 Cox Proportional Model
    11.5 Leveraging Kaplan-Meier Model for Enhanced Insights
    11.6 Kaplan-Meier Analysis for Functional Nodes
  12. Results From Mdoeling
  13. Summary and Predictions
  14. Individual Contribution

1. Introduction ¶

Swire Coca-Cola is one of the five largest Coca-Cola bottlers in the US. Every year, they produce close to 192 million cases of beverages to sell in its markets in 13 states. Recently, Swire CocaCola is experiencing significant losses due to frequent and unplanned machine downtimes in its 6 production plants. These machine downtimes are costing the company greatly as only about 94.4% of ordered cases get produced while the rest are lost in unexpected mechanical failures. When these mechanical failures occur unexpectedly, there are unplanned downtimes that reduce efficiency and productivity at the plants. Due to these unplanned breakdowns, it costs the business an estimated $60 million annually, bringing business to a grinding halt and delaying the time for delivery. The current maintenance strategy is inefficient, as machines are repaired only after breakdown occur, resulting in prolonged periods of inactivity. Swire Coca-Cola wants to further improve the accuracy of these downtime predictions in order to avoid disruptions, decrease the cost of unplanned maintenance, and optimize full production capacity.

In alignment with the CRISP-DM framework, our initial focus will be on data understanding to gain insights into patterns and features critical to machine breakdowns. This includes tasks such as data import and inspection, missing values analysis, exploratory visualizations, and various analyses (univariate, bivariate, multivariate) to understand relationships between factors. Outlier analysis will also be conducted to identify anomalies that might be influencing downtime.

Feature importance will play a crucial role in understanding the factors contributing to these breakdowns. Key features include:

  • What: Understanding what component broke down, with descriptions providing essential context.
  • When: Timestamps of breakdown events help establish time-based patterns and trends.
  • Why: Conditions and descriptions in breakdown cases may offer clues to the underlying causes of failures.
  • Where: Machine location data, available at varying granularities, enables breakdown analysis by machine type or specific components within machines.
  • How: Information about the type of fix used (from the IWC dataset) helps infer the method of correction.

In the modeling phase, we will explore survival analysis techniques like the Kaplan-Meier and Cox proportional hazards model to analyze time-to-failure data and identify factors associated with breakdown risks. Additionally, ARIMA modeling will be applied to time series data to forecast potential failure events. These models will help predict machine downtimes, allowing proactive maintenance scheduling to prevent costly breakdowns.

Ultimately, this predictive maintenance project will provide Swire Coca-Cola with the insights needed to transition from reactive to proactive maintenance, improving uptime, optimizing production, and ensuring a steady flow of products to market. This shift will support Swire Coca-Cola’s goals of operational excellence and customer satisfaction while reducing downtime-related costs.

2. Load Libraries and packages ¶

In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from statsmodels.tsa.seasonal import seasonal_decompose
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from statsmodels.tsa.holtwinters import ExponentialSmoothing
import wordcloud
from wordcloud import WordCloud
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error
from pmdarima import auto_arima
from datetime import datetime
from lifelines import KaplanMeierFitter
from lifelines import CoxPHFitter

warnings.filterwarnings("ignore")

3. Import dataset and inspect ¶

In [ ]:
data_swire = pd.read_csv('IWC_Work_Orders_Extract.csv', low_memory=False)

data_swire_copy = data_swire.copy()

data_swire_copy.head(n=10)
Out[ ]:
ORDER_ID PLANT_ID PRODUCTION_LOCATION EXECUTION_START_DATE EXECUTION_FINISH_DATE ACTUAL_START_TIME ACTUAL_FINISH_TIME ACTUAL_WORK_IN_MINUTES MAINTENANCE_PLAN MAINTENANCE_ITEM ... FUNCTIONAL_AREA_NODE_2_MODIFIED FUNCTIONAL_AREA_NODE_3_MODIFIED FUNCTIONAL_AREA_NODE_4_MODIFIED FUNCTIONAL_AREA_NODE_5_MODIFIED EQUIPMENT_ID EQUIPMENT_DESC EQUIP_CAT_DESC EQUIP_START_UP_DATE EQUIP_VALID_FROM EQUIP_VALID_TO
0 705642457 G812 ROMA 2024-05-04 2024-05-12 06:00:00.000 23:04:08.000 390.0 NaN NaN ... G812 PRD FILLER_ROTARY_CAN G812 PRD FILLER_ROTARY_CAN LINE 3 CONVEYOR FULL CAN CONVEYOR 300126812.0 FULL CAN CONVEYOR (FC41B) Machines 2020-04-22 2020-02-10 9999-12-31
1 704191697 G812 ROMA 2022-09-13 2022-09-13 06:00:00.000 17:17:24.000 420.0 NaN NaN ... G812 PRD FILLER_ROTARY_CAN G812 PRD FILLER_ROTARY_CAN LINE 3 CONVEYOR FULL CAN CONVEYOR 300126812.0 FULL CAN CONVEYOR (FC41B) Machines 2020-04-22 2020-02-10 9999-12-31
2 704466547 G812 ROMA 2022-12-21 2022-12-21 07:00:00.000 07:00:00.000 30.0 NaN NaN ... G812 PRD FILLER_ROTARY_CAN G812 PRD FILLER_ROTARY_CAN LINE 3 CONVEYOR FULL CAN CONVEYOR 300126812.0 FULL CAN CONVEYOR (FC41B) Machines 2020-04-22 2020-02-10 9999-12-31
3 703834477 G812 ROMA 2022-07-04 2022-07-04 06:00:00.000 06:00:00.000 60.0 NaN NaN ... G812 PRD FILLER_ROTARY_CAN G812 PRD FILLER_ROTARY_CAN LINE 3 CONVEYOR FULL CAN CONVEYOR 300126812.0 FULL CAN CONVEYOR (FC41B) Machines 2020-04-22 2020-02-10 9999-12-31
4 704661125 G291 MONZA 2023-03-15 2023-03-15 07:00:00.000 07:00:00.000 120.0 NaN NaN ... AIR SYSTEMS NaN NaN NaN 300001088.0 TANK_STL_STOR_AIR Machines 2017-02-15 2023-12-11 9999-12-31
5 704948720 G291 MONZA 2023-09-08 2023-09-08 07:00:00.000 07:00:00.000 18.0 G291VM1437 40534.0 ... AIR SYSTEMS NaN NaN NaN 300001088.0 TANK_STL_STOR_AIR Machines 2017-02-15 2023-12-11 9999-12-31
6 701673970 G291 MONZA 2019-09-06 2019-09-06 07:00:00.000 07:00:00.000 480.0 NaN NaN ... AIR SYSTEMS NaN NaN NaN 300001088.0 TANK_STL_STOR_AIR Machines 2017-02-15 2023-12-11 9999-12-31
7 705148892 G291 MONZA 2023-10-09 2023-10-09 07:00:00.000 07:00:00.000 30.0 NaN NaN ... AMMONIA & REFRIGERATION SYSTEMS NaN NaN NaN 300001033.0 PUMP_RECIP_HIGH_PRESSURE_NF Machines 2017-02-15 2023-12-11 9999-12-31
8 704769592 G291 MONZA 2023-05-01 2023-05-01 07:00:00.000 07:00:00.000 30.0 NaN NaN ... AMMONIA & REFRIGERATION SYSTEMS NaN NaN NaN 300001033.0 PUMP_RECIP_HIGH_PRESSURE_NF Machines 2017-02-15 2023-12-11 9999-12-31
9 704448350 G291 MONZA 2022-12-13 2022-12-13 08:00:00.000 08:00:00.000 30.0 NaN NaN ... AMMONIA & REFRIGERATION SYSTEMS NaN NaN NaN 300001033.0 PUMP_RECIP_HIGH_PRESSURE_NF Machines 2017-02-15 2023-12-11 9999-12-31

10 rows × 25 columns

In [ ]:
# Check column names and data types
data_swire_copy.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1427264 entries, 0 to 1427263
Data columns (total 25 columns):
 #   Column                           Non-Null Count    Dtype  
---  ------                           --------------    -----  
 0   ORDER_ID                         1427264 non-null  int64  
 1   PLANT_ID                         1427264 non-null  object 
 2   PRODUCTION_LOCATION              1427264 non-null  object 
 3   EXECUTION_START_DATE             1427264 non-null  object 
 4   EXECUTION_FINISH_DATE            1427264 non-null  object 
 5   ACTUAL_START_TIME                1427264 non-null  object 
 6   ACTUAL_FINISH_TIME               1427264 non-null  object 
 7   ACTUAL_WORK_IN_MINUTES           1427264 non-null  float64
 8   MAINTENANCE_PLAN                 152594 non-null   object 
 9   MAINTENANCE_ITEM                 152594 non-null   float64
 10  MAINTENANCE_ACTIVITY_TYPE        1427264 non-null  object 
 11  ORDER_DESCRIPTION                291767 non-null   object 
 12  MAINTENANCE_TYPE_DESCRIPTION     291795 non-null   object 
 13  FUNCTIONAL_LOC                   291782 non-null   object 
 14  FUNCTIONAL_AREA_NODE_1_MODIFIED  290117 non-null   object 
 15  FUNCTIONAL_AREA_NODE_2_MODIFIED  282407 non-null   object 
 16  FUNCTIONAL_AREA_NODE_3_MODIFIED  258691 non-null   object 
 17  FUNCTIONAL_AREA_NODE_4_MODIFIED  237573 non-null   object 
 18  FUNCTIONAL_AREA_NODE_5_MODIFIED  2180 non-null     object 
 19  EQUIPMENT_ID                     291795 non-null   float64
 20  EQUIPMENT_DESC                   79447 non-null    object 
 21  EQUIP_CAT_DESC                   79447 non-null    object 
 22  EQUIP_START_UP_DATE              79447 non-null    object 
 23  EQUIP_VALID_FROM                 79447 non-null    object 
 24  EQUIP_VALID_TO                   79447 non-null    object 
dtypes: float64(3), int64(1), object(21)
memory usage: 272.2+ MB

The dataset comprises 1,427,264 entries across 25 columns, containing a mix of integer, float, and object data types. The integer (ORDER_ID) and float (ACTUAL_WORK_IN_MINUTES, MAINTENANCE_ITEM, and EQUIPMENT_ID) columns represent identifiers and numerical values, while the majority of columns (21) are of type object, which likely includes categorical or date-related information. Several columns related to dates, such as EXECUTION_START_DATE, EXECUTION_FINISH_DATE, and EQUIP_START_UP_DATE, are in object format and should be converted to datetime for accurate temporal analysis. Notably, some columns, such as MAINTENANCE_PLAN, MAINTENANCE_ITEM, and EQUIP_VALID_FROM, exhibit a significant number of missing values, indicating incomplete records that require appropriate handling. Effective imputation strategies for missing values, categorical encoding, and outlier analysis are recommended to ensure data quality and support predictive modeling.

In [ ]:
data_swire_copy.shape
Out[ ]:
(1427264, 25)

The dataset has a shape of (1,427,264, 25), indicating that it comprises 1,427,264 rows and 25 columns. The large number of rows suggests the data was collected over a considerable period or from multiple sources, which provides a rich dataset for analysis. The 25 columns represent various attributes related to equipment, maintenance activities, and operational details, offering a broad set of features that could be valuable for predictive maintenance analysis.

In [ ]:
median_survival_times_dict = {}
for equipment_type in equipment_types:
    type_data = data_swire_copy[data_swire_copy['Equipment_Type'] == equipment_type.capitalize()]
    if not type_data.empty:
        kmf.fit(type_data['machine_age'], event_observed=type_data['failure_event'])
        median_survival = median_survival_times(kmf.survival_function_)
        median_survival_times_dict[equipment_type] = median_survival.values[0][0]

# Simulate the impact of planned maintenance
data_swire_copy['preventive_maintenance'] = data_swire_copy['machine_age'].apply(
    lambda x: 1 if any(x < median_survival_times_dict[etype] for etype in equipment_types if etype in str(data_swire_copy['EQUIPMENT_DESC'].iloc[0]).lower()) else 0
)

# Compare unplanned maintenance with and without the intervention
before_intervention = data_swire_copy['failure_event'].sum()
after_intervention = data_swire_copy.loc[data_swire_copy['preventive_maintenance'] == 0, 'failure_event'].sum()

print(f"Unplanned maintenance events before intervention: {before_intervention}")
print(f"Unplanned maintenance events after intervention: {after_intervention}")

# Calculate downtime reduction
downtime_reduction = (before_intervention - after_intervention) / before_intervention * 100
print(f"Estimated reduction in unplanned maintenance events: {downtime_reduction:.2f}%")
Unplanned maintenance events before intervention: 79447
Unplanned maintenance events after intervention: 79447
Estimated reduction in unplanned maintenance events: 0.00%
In [ ]:
data_swire_copy.dtypes
Out[ ]:
0
ORDER_ID int64
PLANT_ID object
PRODUCTION_LOCATION object
EXECUTION_START_DATE object
EXECUTION_FINISH_DATE object
ACTUAL_START_TIME object
ACTUAL_FINISH_TIME object
ACTUAL_WORK_IN_MINUTES float64
MAINTENANCE_PLAN object
MAINTENANCE_ITEM float64
MAINTENANCE_ACTIVITY_TYPE object
ORDER_DESCRIPTION object
MAINTENANCE_TYPE_DESCRIPTION object
FUNCTIONAL_LOC object
FUNCTIONAL_AREA_NODE_1_MODIFIED object
FUNCTIONAL_AREA_NODE_2_MODIFIED object
FUNCTIONAL_AREA_NODE_3_MODIFIED object
FUNCTIONAL_AREA_NODE_4_MODIFIED object
FUNCTIONAL_AREA_NODE_5_MODIFIED object
EQUIPMENT_ID float64
EQUIPMENT_DESC object
EQUIP_CAT_DESC object
EQUIP_START_UP_DATE object
EQUIP_VALID_FROM object
EQUIP_VALID_TO object

In [ ]:
data_swire_copy.columns
Out[ ]:
Index(['ORDER_ID', 'PLANT_ID', 'PRODUCTION_LOCATION', 'EXECUTION_START_DATE',
       'EXECUTION_FINISH_DATE', 'ACTUAL_START_TIME', 'ACTUAL_FINISH_TIME',
       'ACTUAL_WORK_IN_MINUTES', 'MAINTENANCE_PLAN', 'MAINTENANCE_ITEM',
       'MAINTENANCE_ACTIVITY_TYPE', 'ORDER_DESCRIPTION',
       'MAINTENANCE_TYPE_DESCRIPTION', 'FUNCTIONAL_LOC',
       'FUNCTIONAL_AREA_NODE_1_MODIFIED', 'FUNCTIONAL_AREA_NODE_2_MODIFIED',
       'FUNCTIONAL_AREA_NODE_3_MODIFIED', 'FUNCTIONAL_AREA_NODE_4_MODIFIED',
       'FUNCTIONAL_AREA_NODE_5_MODIFIED', 'EQUIPMENT_ID', 'EQUIPMENT_DESC',
       'EQUIP_CAT_DESC', 'EQUIP_START_UP_DATE', 'EQUIP_VALID_FROM',
       'EQUIP_VALID_TO'],
      dtype='object')
In [ ]:
data_swire_copy.isnull().sum()
Out[ ]:
0
ORDER_ID 0
PLANT_ID 0
PRODUCTION_LOCATION 0
EXECUTION_START_DATE 0
EXECUTION_FINISH_DATE 0
ACTUAL_START_TIME 0
ACTUAL_FINISH_TIME 0
ACTUAL_WORK_IN_MINUTES 0
MAINTENANCE_PLAN 1274670
MAINTENANCE_ITEM 1274670
MAINTENANCE_ACTIVITY_TYPE 0
ORDER_DESCRIPTION 1135497
MAINTENANCE_TYPE_DESCRIPTION 1135469
FUNCTIONAL_LOC 1135482
FUNCTIONAL_AREA_NODE_1_MODIFIED 1137147
FUNCTIONAL_AREA_NODE_2_MODIFIED 1144857
FUNCTIONAL_AREA_NODE_3_MODIFIED 1168573
FUNCTIONAL_AREA_NODE_4_MODIFIED 1189691
FUNCTIONAL_AREA_NODE_5_MODIFIED 1425084
EQUIPMENT_ID 1135469
EQUIPMENT_DESC 1347817
EQUIP_CAT_DESC 1347817
EQUIP_START_UP_DATE 1347817
EQUIP_VALID_FROM 1347817
EQUIP_VALID_TO 1347817

4. Missing Value analysis ¶

In [ ]:
# Calculate the percentage of missing values for each column
missing_values_percentage = data_swire_copy.isnull().mean() * 100

# Filter out columns that have no missing values
missing_values_percentage = missing_values_percentage[missing_values_percentage > 0]

print(missing_values_percentage)
MAINTENANCE_PLAN                   89.308635
MAINTENANCE_ITEM                   89.308635
ORDER_DESCRIPTION                  79.557601
MAINTENANCE_TYPE_DESCRIPTION       79.555639
FUNCTIONAL_LOC                     79.556550
FUNCTIONAL_AREA_NODE_1_MODIFIED    79.673207
FUNCTIONAL_AREA_NODE_2_MODIFIED    80.213401
FUNCTIONAL_AREA_NODE_3_MODIFIED    81.875042
FUNCTIONAL_AREA_NODE_4_MODIFIED    83.354656
FUNCTIONAL_AREA_NODE_5_MODIFIED    99.847260
EQUIPMENT_ID                       79.555639
EQUIPMENT_DESC                     94.433616
EQUIP_CAT_DESC                     94.433616
EQUIP_START_UP_DATE                94.433616
EQUIP_VALID_FROM                   94.433616
EQUIP_VALID_TO                     94.433616
dtype: float64
In [ ]:
# Plotting the bar plot
plt.figure(figsize=(13, 8))
missing_values_percentage.sort_values().plot(kind='barh', color='skyblue')
plt.xlabel('Percentage of Missing Values')
plt.ylabel('Columns')
plt.title('Percentage of Missing Values by Column')

# Add numerical values on the bars
for index, value in enumerate(missing_values_percentage.sort_values()):
    plt.text(value + 0.5, index, f'{value:.2f}%', va='center')

plt.show()
No description has been provided for this image

The missing value analysis reveals that several columns in the dataset have a significant proportion of missing data, ranging from approximately 79.5% to 99.8%. Notably, columns like MAINTENANCE_PLAN and MAINTENANCE_ITEM have around 89.3% missing values, indicating potential data quality issues or inconsistent record-keeping in maintenance documentation. Furthermore, the EQUIP_START_UP_DATE, EQUIP_VALID_FROM, and EQUIP_VALID_TO columns each have 94.4% missing values, which highlights considerable gaps in equipment lifecycle data. The FUNCTIONAL_AREA_NODE_5_MODIFIED column is missing values for almost 99.8% of the records, suggesting that information at this level of detail might not have been recorded consistently or may only apply to specific equipment types. Addressing these missing values will be crucial for ensuring robust data quality and reliability in subsequent analyses and predictive modeling efforts.

4.1 Imputation of Missing Value's ¶

In [ ]:
categorical_columns = ['MAINTENANCE_PLAN', 'MAINTENANCE_ITEM', 'ORDER_DESCRIPTION', 'MAINTENANCE_TYPE_DESCRIPTION',
                       'FUNCTIONAL_LOC', 'FUNCTIONAL_AREA_NODE_1_MODIFIED', 'FUNCTIONAL_AREA_NODE_2_MODIFIED',
                       'FUNCTIONAL_AREA_NODE_3_MODIFIED', 'FUNCTIONAL_AREA_NODE_4_MODIFIED', 'FUNCTIONAL_AREA_NODE_5_MODIFIED',
                       'EQUIPMENT_ID', 'EQUIPMENT_DESC', 'EQUIP_CAT_DESC']

for col in categorical_columns:
    data_swire_copy[col] = data_swire_copy[col].fillna('Unknown')

data_swire_copy['EQUIP_START_UP_DATE'] = pd.to_datetime(data_swire_copy['EQUIP_START_UP_DATE'], errors='coerce')
data_swire_copy['EQUIP_START_UP_DATE'] = data_swire_copy['EQUIP_START_UP_DATE'].fillna(pd.to_datetime('1960-01-01'))

data_swire_copy['EQUIP_VALID_FROM'] = np.where(
    data_swire_copy['EQUIP_VALID_FROM'].isna(),
    data_swire_copy['EQUIP_START_UP_DATE'] + pd.Timedelta(days=0),
    data_swire_copy['EQUIP_VALID_FROM']
)

data_swire_copy.fillna({'EQUIP_VALID_TO': pd.to_datetime('2200-01-01')}, inplace=True)


data_swire_copy['EQUIPMENT_DESC'] = data_swire_copy['EQUIPMENT_DESC'].fillna(data_swire_copy['EQUIPMENT_DESC'].mode()[0])
data_swire_copy['EQUIP_CAT_DESC'] = data_swire_copy['EQUIP_CAT_DESC'].fillna(data_swire_copy['EQUIP_CAT_DESC'].mode()[0])

For categorical columns, such as MAINTENANCE_PLAN, ORDER_DESCRIPTION, and several FUNCTIONAL_AREA_NODE columns, missing values are replaced with 'Unknown' to ensure data completeness without introducing bias. Date columns like EQUIP_START_UP_DATE, EQUIP_VALID_FROM, and EQUIP_VALID_TO are converted to datetime format and missing entries are filled with placeholder dates (e.g., 1900-01-01 for startup dates and 2200-01-01 for end-of-use dates), enabling meaningful time-based analysis while avoiding data loss. For all rows where EQUIP_VALID_FROM is missing, we are imputing the equipment's start-up date (EQUIP_START_UP_DATE) as the valid-from date, meaning the equipment is considered valid from the date it was initially acquired. This ensures there are no gaps in the EQUIP_VALID_FROM column. Additionally, the most common value (mode) is used to fill missing values in EQUIPMENT_DESC and EQUIP_CAT_DESC to retain categorical consistency. These imputation techniques ensure that the dataset remains comprehensive and ready for further exploration and modeling, without dropping any rows or losing essential information.

In [ ]:
data_swire_copy.isnull().sum()
Out[ ]:
0
ORDER_ID 0
PLANT_ID 0
PRODUCTION_LOCATION 0
EXECUTION_START_DATE 0
EXECUTION_FINISH_DATE 0
ACTUAL_START_TIME 0
ACTUAL_FINISH_TIME 0
ACTUAL_WORK_IN_MINUTES 0
MAINTENANCE_PLAN 1274670
MAINTENANCE_ITEM 1274670
MAINTENANCE_ACTIVITY_TYPE 0
ORDER_DESCRIPTION 1135497
MAINTENANCE_TYPE_DESCRIPTION 1135469
FUNCTIONAL_LOC 1135482
FUNCTIONAL_AREA_NODE_1_MODIFIED 1137147
FUNCTIONAL_AREA_NODE_2_MODIFIED 1144857
FUNCTIONAL_AREA_NODE_3_MODIFIED 1168573
FUNCTIONAL_AREA_NODE_4_MODIFIED 1189691
FUNCTIONAL_AREA_NODE_5_MODIFIED 1425084
EQUIPMENT_ID 1135469
EQUIPMENT_DESC 1347817
EQUIP_CAT_DESC 1347817
EQUIP_START_UP_DATE 1347817
EQUIP_VALID_FROM 1347817
EQUIP_VALID_TO 1347817

In [ ]:
data_swire_copy.head(n=10)
Out[ ]:
ORDER_ID PLANT_ID PRODUCTION_LOCATION EXECUTION_START_DATE EXECUTION_FINISH_DATE ACTUAL_START_TIME ACTUAL_FINISH_TIME ACTUAL_WORK_IN_MINUTES MAINTENANCE_PLAN MAINTENANCE_ITEM ... FUNCTIONAL_AREA_NODE_2_MODIFIED FUNCTIONAL_AREA_NODE_3_MODIFIED FUNCTIONAL_AREA_NODE_4_MODIFIED FUNCTIONAL_AREA_NODE_5_MODIFIED EQUIPMENT_ID EQUIPMENT_DESC EQUIP_CAT_DESC EQUIP_START_UP_DATE EQUIP_VALID_FROM EQUIP_VALID_TO
0 705642457 G812 ROMA 2024-05-04 2024-05-12 06:00:00.000 23:04:08.000 390.0 NaN NaN ... G812 PRD FILLER_ROTARY_CAN G812 PRD FILLER_ROTARY_CAN LINE 3 CONVEYOR FULL CAN CONVEYOR 300126812.0 FULL CAN CONVEYOR (FC41B) Machines 2020-04-22 2020-02-10 9999-12-31
1 704191697 G812 ROMA 2022-09-13 2022-09-13 06:00:00.000 17:17:24.000 420.0 NaN NaN ... G812 PRD FILLER_ROTARY_CAN G812 PRD FILLER_ROTARY_CAN LINE 3 CONVEYOR FULL CAN CONVEYOR 300126812.0 FULL CAN CONVEYOR (FC41B) Machines 2020-04-22 2020-02-10 9999-12-31
2 704466547 G812 ROMA 2022-12-21 2022-12-21 07:00:00.000 07:00:00.000 30.0 NaN NaN ... G812 PRD FILLER_ROTARY_CAN G812 PRD FILLER_ROTARY_CAN LINE 3 CONVEYOR FULL CAN CONVEYOR 300126812.0 FULL CAN CONVEYOR (FC41B) Machines 2020-04-22 2020-02-10 9999-12-31
3 703834477 G812 ROMA 2022-07-04 2022-07-04 06:00:00.000 06:00:00.000 60.0 NaN NaN ... G812 PRD FILLER_ROTARY_CAN G812 PRD FILLER_ROTARY_CAN LINE 3 CONVEYOR FULL CAN CONVEYOR 300126812.0 FULL CAN CONVEYOR (FC41B) Machines 2020-04-22 2020-02-10 9999-12-31
4 704661125 G291 MONZA 2023-03-15 2023-03-15 07:00:00.000 07:00:00.000 120.0 NaN NaN ... AIR SYSTEMS NaN NaN NaN 300001088.0 TANK_STL_STOR_AIR Machines 2017-02-15 2023-12-11 9999-12-31
5 704948720 G291 MONZA 2023-09-08 2023-09-08 07:00:00.000 07:00:00.000 18.0 G291VM1437 40534.0 ... AIR SYSTEMS NaN NaN NaN 300001088.0 TANK_STL_STOR_AIR Machines 2017-02-15 2023-12-11 9999-12-31
6 701673970 G291 MONZA 2019-09-06 2019-09-06 07:00:00.000 07:00:00.000 480.0 NaN NaN ... AIR SYSTEMS NaN NaN NaN 300001088.0 TANK_STL_STOR_AIR Machines 2017-02-15 2023-12-11 9999-12-31
7 705148892 G291 MONZA 2023-10-09 2023-10-09 07:00:00.000 07:00:00.000 30.0 NaN NaN ... AMMONIA & REFRIGERATION SYSTEMS NaN NaN NaN 300001033.0 PUMP_RECIP_HIGH_PRESSURE_NF Machines 2017-02-15 2023-12-11 9999-12-31
8 704769592 G291 MONZA 2023-05-01 2023-05-01 07:00:00.000 07:00:00.000 30.0 NaN NaN ... AMMONIA & REFRIGERATION SYSTEMS NaN NaN NaN 300001033.0 PUMP_RECIP_HIGH_PRESSURE_NF Machines 2017-02-15 2023-12-11 9999-12-31
9 704448350 G291 MONZA 2022-12-13 2022-12-13 08:00:00.000 08:00:00.000 30.0 NaN NaN ... AMMONIA & REFRIGERATION SYSTEMS NaN NaN NaN 300001033.0 PUMP_RECIP_HIGH_PRESSURE_NF Machines 2017-02-15 2023-12-11 9999-12-31

10 rows × 25 columns

5.Univariate analysis ¶

In the context of predictive maintenance for Swire Coca-Cola, univariate analysis is essential to understand the distribution of each feature, such as maintenance duration (ACTUAL_WORK_IN_MINUTES) or maintenance types. By analyzing each variable independently, we can identify key characteristics, such as the frequency of unplanned vs. planned maintenance or the age distribution of the equipment.

In [ ]:
data_swire_copy['ACTUAL_WORK_IN_MINUTES'].describe().round(2)
Out[ ]:
ACTUAL_WORK_IN_MINUTES
count 1427264.00
mean 88.63
std 803.79
min 0.00
25% 12.00
50% 48.00
75% 90.00
max 330184.80

The summary statistics for ACTUAL_WORK_IN_MINUTES show that the dataset contains 142,264 records for the duration of work performed on maintenance tasks, with a mean of 88.63 minutes. The standard deviation is 803.79 minutes, indicating a high variability in the time taken for maintenance activities. The minimum recorded value is 0 minutes, suggesting instances where no significant work was logged, which may represent very minor fixes or erroneous data. The median (50th percentile) value is 48 minutes, while the 25th and 75th percentiles are 12 minutes and 90 minutes, respectively, showing that most maintenance tasks are relatively short. However, the maximum value of 330,184.80 minutes highlights some extreme outliers, possibly due to extended downtimes or misrecorded data, which may require further investigation and handling during analysis. Overall, these statistics suggest a skewed distribution, with most maintenance activities being short in duration, while a few instances are significantly longer.

5.1. Distribution of actual work in minutes ¶

In [ ]:
warnings.filterwarnings("ignore", category=FutureWarning, module="pandas")

sns.histplot(data_swire_copy['ACTUAL_WORK_IN_MINUTES'], bins=30, kde=True)
plt.title('Distribution of Actual Work in Minutes')
plt.xlabel('Actual Work in Minutes')
plt.ylabel('Frequency')
plt.show()
No description has been provided for this image

From the above graph, which is between actual work in minutes and frequency, we can see that there is a massive spike at zero, which means that most of the downtimes which occur are small fixes or minor issues that require significantly less time to resolve the problem which doesn't affect the production. However, the outliers which exceed more than 1000 minutes are likely to be unplanned maintenance, where the repairs are difficult and take a considerable amount of time to do so, which affects the production and the cost in the millions, even if these repairs are rare but predicting them in advance can reduce the impact they cause.

5.2. Breakdown counts by production location ¶

In [ ]:
plt.figure(figsize=(12, 6))
location_counts = data_swire_copy['PRODUCTION_LOCATION'].value_counts()
sns.barplot(x=location_counts.values, y=location_counts.index, palette='viridis')
plt.title('Breakdown Counts by Production Location')
plt.xlabel('Number of Breakdowns')
plt.ylabel('Production Location')
plt.tight_layout()
plt.show()
No description has been provided for this image

The bar chart shows the breakdown counts by production location, with "Silverstone" having the highest number of breakdowns, significantly surpassing other locations. "Suzuka" follows as the second highest, and "Monza" ranks third. The other locations—"COTA," "Monaco," and "Roma"—have relatively lower breakdown counts compared to the top three. This visualization highlights that Silverstone is experiencing the most operational issues, indicating the need for focused maintenance strategies to address the frequent breakdowns and improve efficiency in this location.

5.3. Breakdown trends over years ¶

In [ ]:
# Convert 'EXECUTION_START_DATE' to datetime format
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')

# Grouping breakdowns by year
breakdown_trends_yearly = data_swire_copy['EXECUTION_START_DATE'].groupby(data_swire_copy['EXECUTION_START_DATE'].dt.year).count()

# Plotting breakdown trends over time by year
plt.figure(figsize=(12, 6))
breakdown_trends_yearly.plot(kind='line', marker='o', color='b')
plt.title('Breakdown Trends Over Years')
plt.xlabel('Year')
plt.ylabel('Number of Breakdowns')
plt.xticks(breakdown_trends_yearly.index, rotation=45)

plt.tight_layout()
plt.show()
No description has been provided for this image

The line graph depicts the breakdown trends over the years, showing a notable increase in the number of breakdowns from 2015 to 2018, reaching its peak around 2019. This indicates a period of operational challenges and frequent machine failures. Post-2019, there seems to be a stabilization in the number of breakdowns, although still relatively high, with slight fluctuations between 2020 and 2023. In 2024, a significant decrease in breakdowns is observed, suggesting possible improvements in maintenance practices or a decline in production activities that could have reduced the number of incidents. This trend emphasizes the need for further analysis of factors contributing to the fluctuations in breakdown occurrences, especially during peak periods.

5.4. Number of maintenance activites per month in 2023 ¶

In [ ]:
# Assuming 'EXECUTION_START_DATE' is already converted to datetime
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'])

# Filter data for a specific year
specific_year = 2023
data_specific_year = data_swire_copy[data_swire_copy['EXECUTION_START_DATE'].dt.year == specific_year]

# Grouping by month within the specific year
monthly_counts = data_specific_year['EXECUTION_START_DATE'].groupby(data_specific_year['EXECUTION_START_DATE'].dt.month).count()

# Plotting the number of maintenance activities per month in the specific year
monthly_counts.plot(kind='line', marker='o', figsize=(10, 6))
plt.title(f'Number of Maintenance Activities per Month in {specific_year}')
plt.xlabel('Month')
plt.ylabel('Number of Maintenance Activities')
plt.xticks(ticks=range(1, 13), labels=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
plt.tight_layout()
plt.show()
No description has been provided for this image

The line graph depicts the number of maintenance activities per month for the year 2023. Notably, there are fluctuations throughout the year, with peaks in March and August, and a significant drop in maintenance activities in July. This trend may suggest that preventive maintenance or machine installations done in certain months, such as June, effectively reduced downtime in the following months. The noticeable drop in maintenance activities in July after high activity in June may indicate that the measures taken during June were successful in temporarily reducing the need for maintenance. Similarly, the increase in maintenance activities in subsequent months, such as August and November, points to the need for more consistent preventive measures to ensure stability in machine operations. Overall, the data suggests that timely intervention can lead to reductions in downtime, but continuous monitoring and preventive actions are necessary to maintain optimal operational efficiency.

5.5. Frequency of maintenance activity type ¶

In [ ]:
# Bar chart for MAINTENANCE_ACTIVITY_TYPE
plt.figure(figsize=(10, 6))
sns.countplot(x='MAINTENANCE_ACTIVITY_TYPE', data=data_swire_copy)
plt.title('Frequency of Maintenance Activity Type')
plt.xlabel('Maintenance Activity Type')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.ticklabel_format(style='plain', axis='y')
plt.show()
No description has been provided for this image

The bar chart shows the frequency of maintenance activities, categorizing them into "Planned" and "Unplanned" types. It is evident that the majority of maintenance activities are unplanned, with over 1.2 million occurrences compared to significantly fewer planned maintenance activities. Which suggests that most of the downtimes occur without any warning and can be the leading cause of Financial loss, highlighting the need for predictive maintenance.

In [ ]:
data_swire_copy['MAINTENANCE_ACTIVITY_TYPE'].value_counts()
Out[ ]:
count
MAINTENANCE_ACTIVITY_TYPE
Unplanned 1276877
Planned 150387

The value counts for MAINTENANCE_ACTIVITY_TYPE indicate that the dataset predominantly consists of unplanned maintenance activities, with 1,276,877 occurrences, compared to only 150,387 planned maintenance activities. This suggests that the current maintenance strategy is largely reactive, dealing with breakdowns as they occur rather than proactively planning maintenance to prevent failures. The significantly higher count of unplanned maintenance events highlights an area for improvement, emphasizing the need for a more proactive, predictive maintenance approach to minimize unexpected downtimes and enhance operational efficiency.

6.Bivariate analysis ¶

Bivariate analysis is crucial in our case to explore the relationships between two variables, such as the connection between equipment age (EQUIP_AGE) and downtime (ACTUAL_WORK_IN_MINUTES), or between MAINTENANCE_ACTIVITY_TYPE (planned vs. unplanned) and downtime. Understanding these relationships helps us identify which features have the most influence on machine breakdowns or prolonged downtimes, allowing us to determine key drivers behind equipment failures.

6.1 Average downtime by maintenance activity type ¶

In [ ]:
# Grouping the data by MAINTENANCE_ACTIVITY_TYPE and calculating the mean of ACTUAL_WORK_IN_MINUTES
grouped_data = data_swire_copy.groupby('MAINTENANCE_ACTIVITY_TYPE')['ACTUAL_WORK_IN_MINUTES'].mean().reset_index()

# Plotting the average downtime (ACTUAL_WORK_IN_MINUTES) for each Maintenance Activity Type
plt.figure(figsize=(12, 8))
bar_plot = sns.barplot(x='MAINTENANCE_ACTIVITY_TYPE', y='ACTUAL_WORK_IN_MINUTES', data=grouped_data, palette='viridis')

# Customizing the plot
plt.title('Average Downtime (Actual Work in Minutes) by Maintenance Activity Type')
plt.xlabel('Maintenance Activity Type')
plt.ylabel('Average Actual Work in Minutes')
plt.xticks(rotation=45)

# Adding values to each bar in the plot
for index, row in grouped_data.iterrows():
    bar_plot.text(index, row['ACTUAL_WORK_IN_MINUTES'], f'{row["ACTUAL_WORK_IN_MINUTES"]:.2f}',
                  ha='center', va='bottom', fontsize=10, color='black')

plt.tight_layout()
plt.show()
No description has been provided for this image

From the above graph of average downtime by maintenance activity, we can see that for unplanned downtimes, the average time required to resolve the issues is 93.41 minutes, whereas for planned activities, it is 48.03 minutes. Therefore, emphasizing more planned maintenance activities could substantially reduce downtime, enhance operational stability, and lower maintenance-related costs for Swire Coca-Cola.

6.2 Average downtime by plant ¶

In [ ]:
grouped_data = data_swire_copy.groupby('PLANT_ID')['ACTUAL_WORK_IN_MINUTES'].mean().reset_index()

# Re-plotting as requested without numerical values and with red color bars
plt.figure(figsize=(12, 6))
bar_plot = sns.barplot(
    x='PLANT_ID',
    y='ACTUAL_WORK_IN_MINUTES',
    data=grouped_data,
    color='red',  # Use solid red color
    errorbar=None
)

# Customizing the plot
plt.title('Average Downtime by Plant', fontsize=14)
plt.xlabel('Plant ID', fontsize=12)
plt.ylabel('Average Actual Work in Minutes', fontsize=12)
plt.xticks(rotation=45, fontsize=10)
plt.yticks(fontsize=10)

# Adjust layout and show the plot
plt.tight_layout()
plt.show()
No description has been provided for this image

From the above graph of average downtime by plant, we can see that plant G221 has the highest average downtime of 100 minutes, where plants G261 and G291 are six minutes behind, and plant G812 has the lowest downtime of 50.14. From this, the plants G221, G261 and G291 need more attention to reduce the average time by anticipating the breakdowns beforehand.

6.3 Count of Maintenance type by maintenance activity type ¶

In [ ]:
plt.figure(figsize=(12, 6))
count_plot = sns.countplot(x='MAINTENANCE_ACTIVITY_TYPE', hue='MAINTENANCE_TYPE_DESCRIPTION', data=data_swire_copy, palette='viridis')
plt.title('Count of Maintenance Type by Maintenance Activity Type')
plt.xlabel('Maintenance Activity Type')
plt.ylabel('Count')
plt.xticks(rotation=45)
for p in count_plot.patches:
    count_plot.annotate(format(p.get_height(), '.0f'),
                        (p.get_x() + p.get_width() / 2., p.get_height()),
                        ha='center', va='center', xytext=(0, 6), textcoords='offset points', fontsize=9, color='black')
plt.tight_layout()
plt.show()
No description has been provided for this image

The bar chart provides a breakdown of maintenance activities by type. The majority of unplanned maintenance activities are classified as "Unknown," with over 1.1 million instances. This highlights a significant gap in documenting the specific type of maintenance performed, suggesting the need for better classification and tracking processes. Meanwhile, preventive maintenance activities account for a much smaller share compared to corrective maintenance orders, indicating that proactive measures are being underutilized. Increasing planned maintenance could help reduce the frequency of corrective and unknown maintenance activities, improving operational efficiency.

6.4 Production line Analysis using Functional Area ¶

The reason for choosing Functional Area Nodes 4 and 5 in the analysis is because these nodes represent more granular levels of the production and equipment hierarchy, providing deeper insights into the production line's operations. Functional Area Node 4 typically corresponds to specific processes or sub-processes within a plant, while Node 5 further narrows down to individual machine types or components. By analyzing Nodes 4 and 5, we can identify which specific parts of the production line or machinery are contributing the most to downtime, allowing for targeted interventions.

In [ ]:
print(data_swire_copy['FUNCTIONAL_AREA_NODE_4_MODIFIED'].value_counts())
FUNCTIONAL_AREA_NODE_4_MODIFIED
FILLER                      46322
PACKER                      35230
CONVEYOR                    34765
LABELER                     18945
PALLETIZER                  16239
                            ...  
MIXERS                          1
BATCH STATION METER SKID        1
G811 CONDENSATE RETURN          1
CIP TANK SKID                   1
CIP VALVE MATRIX SKID           1
Name: count, Length: 86, dtype: int64
In [ ]:
print(data_swire_copy['FUNCTIONAL_AREA_NODE_5_MODIFIED'].value_counts())
FUNCTIONAL_AREA_NODE_5_MODIFIED
FULL CAN TO ACCUMULATION TABLE              251
PACKER 1, MODEL DUODOZEN 1250SX             210
FULL CASE CONVEYORS                         158
EMPTY CASE CONVEYORS                        136
FULL CASE SPIRAL TO PALLETIZER              125
FB CONVEYORS TO DOUGLAS/HI-CONE SPLIT       114
EMPTY CAN                                   110
COMPRESSOR                                  110
PACKER 2, MODEL FENIX 185V                  108
FULL CASE CONVEYOR                          101
EAST SYRUP TANKS                             98
FULL BOTTLE CONVEYORS                        97
EMPTY CAN CONVEYOR                           90
*OLD SYSTEM MT CASE CONVEYORS*               56
LOOSE FB TO PACKER CONVEYORS                 52
FULL CAN CONVEYOR                            50
EVANS COOLING TUNNEL L3, INSIDE BUILDING     48
HARTNESS TO PAI FULL CASE CONVEYORS          47
PARTIAL CASES GPI (RIVERWOOD) TO DOUGLAS     43
INFEED TO HICONE TO PACKER                   33
EMPTY CAN VACUUM BLOWERS                     26
FULL CAN INFEED GPI                          24
DEPAL TO FILLER MT BOTTLE CONVEYORS          17
ACCUMULATION TABLES                          16
FULL CAN INFEED DOUGLAS/GPI (RIVERWOOD)      12
FULL CASE GPI (RIVERWOOD)                    10
FULL CASE FROM DOUGLAS TO PAI                 9
FULL CASE DOUGLAS                             7
PACK CONVEYOR                                 6
EVANS COOLING TUNNEL L3, OUTSIDE BUILDIN      5
CONVEYORS FOR PALLETIZER                      4
BLOWER SECTION #12 ACE-3-M112                 3
FB INFEED TO DOUGLAS CONVEYORS                1
BLOWER SECTION #1 ACE-3-M101                  1
BLOWER SECTION #11 ACE-3-M111                 1
PALLET CONVEYOR                               1
Name: count, dtype: int64

6.4.1 Average downtime by functional area node 4 ¶

In [ ]:
avg_downtime = data_swire_copy.groupby('FUNCTIONAL_AREA_NODE_4_MODIFIED')['ACTUAL_WORK_IN_MINUTES'].mean().reset_index()

plt.figure(figsize=(12, 6))
sns.barplot(x='FUNCTIONAL_AREA_NODE_4_MODIFIED', y='ACTUAL_WORK_IN_MINUTES', data=avg_downtime, palette='viridis')
plt.title('Average Downtime by Functional Area Node 4')
plt.xlabel('Functional Area Node 4')
plt.ylabel('Average Actual Work in Minutes')
plt.xticks(rotation=90)
plt.show()
No description has been provided for this image

From the above graph, average downtime by functional area node four, we can see the variation of downtime in various areas, where the average downtime is 50 to 100 minutes for some regions. The data reveals that "HVAC/SWAMP COOLERS" experienced the highest average downtime of approximately 815 minutes, indicating a significant maintenance burden. Several other functional areas, such as "BATCH STATION METER XX" and "G811 COMPRESSOR SYSTEMS," also show extended downtimes exceeding 400 minutes, which highlights potential inefficiencies in these nodes. Understanding these prolonged downtimes can help target specific areas for optimization and preventive maintenance strategies to reduce operational delays and improve overall plant efficiency.

6.4.2 Average downtime by functional area node 5 ¶

In [ ]:
# Grouping the data by FUNCTIONAL_AREA_NODE_5_MODIFIED and calculating the average downtime (ACTUAL_WORK_IN_MINUTES)
average_downtime_node5 = data_swire_copy.groupby('FUNCTIONAL_AREA_NODE_5_MODIFIED')['ACTUAL_WORK_IN_MINUTES'].mean().reset_index()

# Sorting the values for better visualization
average_downtime_node5 = average_downtime_node5.sort_values(by='ACTUAL_WORK_IN_MINUTES', ascending=False)

# Plotting the average downtime by Functional Area Node 5
plt.figure(figsize=(12, 8))
sns.barplot(x='ACTUAL_WORK_IN_MINUTES', y='FUNCTIONAL_AREA_NODE_5_MODIFIED', data=average_downtime_node5, palette='viridis')
plt.title('Average Downtime by Functional Area Node 5')
plt.xlabel('Average Actual Work in Minutes')
plt.ylabel('Functional Area Node 5')

for index, value in enumerate(average_downtime_node5['ACTUAL_WORK_IN_MINUTES']):
    plt.text(value, index, f'{value:.2f}', va='center', ha='left')

plt.tight_layout()
plt.show()
No description has been provided for this image

The bar chart visualizes the average downtime by various functional area nodes. It highlights that "CONVEYORS FOR PALLETIZER" and "COMPRESSOR" have the highest average downtime, indicating these areas experience significant maintenance delays. Conversely, equipment like "BLOWER SECTION #1 ACE-3-M101" and "PALLET CONVEYOR" have minimal downtime, suggesting they are more efficient or require less maintenance. This information is crucial for prioritizing maintenance efforts and focusing on areas that cause the most disruptions to improve overall efficiency.

6.5 Frequency of Maintenance Type Across Equipment Categories ¶

In [ ]:
plt.figure(figsize=(12, 6))
count_plot = sns.countplot(x='EQUIP_CAT_DESC', hue='MAINTENANCE_TYPE_DESCRIPTION', data=data_swire_copy, palette='viridis')
plt.title('Frequency of Maintenance Type Across Equipment Categories')
plt.xlabel('Equipment Category')
plt.ylabel('Count')
plt.xticks(rotation=45)

# Adding values on top of each bar
for p in count_plot.patches:
    count_plot.annotate(format(p.get_height(), '.0f'),
                        (p.get_x() + p.get_width() / 2., p.get_height()),
                        ha='center', va='center', xytext=(0, 5), textcoords='offset points')

plt.tight_layout()
plt.show()
No description has been provided for this image

This bar plot illustrates the frequency of different maintenance types across various equipment categories. The "Unknown" category dominates the chart, with over 1.1 million occurrences, which highlights potential data quality issues or gaps in proper classification. The "Machines" category shows a relatively high count of corrective maintenance compared to preventive, indicating that these assets are primarily addressed post-failure, which suggests inefficiencies in maintenance practices. It is also clear that other equipment categories, such as "Production resources/tools," "Test measurement equipment," and "Plant & Buildings," have extremely low or no maintenance records, which could either reflect lack of incidents or insufficient logging of maintenance activities. Addressing these unknowns and optimizing preventive measures could reduce unexpected breakdowns, thereby improving productivity and efficiency.

6.6 Scatter Plot of Equipment Age vs Actual Work in Minutes ¶

In [ ]:
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')
data_swire_copy['EQUIP_VALID_FROM'] = pd.to_datetime(data_swire_copy['EQUIP_VALID_FROM'], errors='coerce')

# Calculate equipment age in years
data_swire_copy['EQUIP_AGE'] = (data_swire_copy['EXECUTION_START_DATE'] - data_swire_copy['EQUIP_VALID_FROM']).dt.days / 365

# Filter for records where equipment age is greater than 0
filtered_data = data_swire_copy[data_swire_copy['EQUIP_AGE'] > 0]

# Scatter plot of equipment age vs actual work in minutes
plt.figure(figsize=(10, 6))
sns.scatterplot(x='EQUIP_AGE', y='ACTUAL_WORK_IN_MINUTES', data=filtered_data, alpha=0.6)
plt.title("Scatter Plot of Equipment Age vs Actual Work in Minutes")
plt.xlabel("Equipment Age (years)")
plt.ylabel("Actual Work in Minutes")
plt.show()
No description has been provided for this image

Observing the data, there is no clear linear or consistent trend indicating that older equipment consistently requires more maintenance time. However, a few scattered points show higher actual work minutes even for relatively young equipment (0-2 years), which may indicate isolated instances of breakdowns or extensive repairs. Most of the data points lie within a lower range of work minutes across all equipment ages, suggesting that, on average, equipment age alone might not be a primary factor in determining maintenance workload.

7.Multivariate analysis ¶

In our predictive maintenance project, multivariate analysis is performed to understand how multiple factors interact and collectively influence outcomes like downtime and maintenance type. For instance, we can analyze the combined effect of equipment age, maintenance type, and functional area on the likelihood of machine failure. This analysis helps to identify complex relationships and dependencies that might be affecting machine performance and downtime.

7.1 Average Downtime by Maintenance Activity Type for Different Plants ¶

In [ ]:
# Grouping data by MAINTENANCE_ACTIVITY_TYPE and PLANT_ID and calculating the mean of ACTUAL_WORK_IN_MINUTES
grouped_data = data_swire_copy.groupby(['MAINTENANCE_ACTIVITY_TYPE', 'PLANT_ID'])['ACTUAL_WORK_IN_MINUTES'].mean().reset_index()

# Plotting a bar plot with hue to differentiate plants
plt.figure(figsize=(12, 6))
sns.barplot(x='MAINTENANCE_ACTIVITY_TYPE', y='ACTUAL_WORK_IN_MINUTES', hue='PLANT_ID', data=grouped_data, palette='viridis')

# Customizing the plot
plt.title('Average Downtime by Maintenance Activity Type for Different Plants')
plt.xlabel('Maintenance Activity Type')
plt.ylabel('Average Actual Work in Minutes')
plt.xticks(rotation=45)
plt.legend(title='Plant ID')
plt.tight_layout()
plt.show()
No description has been provided for this image

From the above graph of average downtime by maintenance activity type for different plants, we can see that the G291, G221, and G261 plants tend to have the highest unplanned downtime with an average time of 100 minutes for Maintenance, which says that these plants need more attention as the unplanned downtimes tend to take more time to resolve the issue.

7.2 Maintenance Type by Equipment Age and Downtime ¶

In [ ]:
# Filter for records where equipment age is greater than 0
filtered_data = data_swire_copy[data_swire_copy['EQUIP_AGE'] > 0]

# Scatter plot to visualize the relationship between equipment age and maintenance type
plt.figure(figsize=(10, 6))
sns.scatterplot(x='EQUIP_AGE', y='ACTUAL_WORK_IN_MINUTES', hue='MAINTENANCE_TYPE_DESCRIPTION', data=filtered_data, alpha=0.6, palette='viridis')
plt.title('Maintenance Type by Equipment Age and Downtime (Age > 0)')
plt.xlabel('Equipment Age (Years)')
plt.ylabel('Actual Work in Minutes')
plt.tight_layout()
plt.show()
No description has been provided for this image

The data points are color-coded based on the maintenance type, including corrective, preventive, administrative, and breakdown maintenance orders. Most of the points, regardless of maintenance type, cluster at lower downtime values, especially around the 0-2 year mark. There are scattered instances of high downtime, particularly for younger equipment, indicating occasional extensive repairs or maintenance requirements even at an early age. Breakdown maintenance, which involves unplanned repairs, shows significant downtime spread across various equipment ages, potentially contributing to increased operational disruptions. Preventive and corrective maintenance are generally clustered towards lower downtime, suggesting they might be more manageable and less time-intensive compared to breakdowns. The plot suggests that unplanned breakdown maintenance tends to take longer to resolve, underscoring the importance of preventive maintenance strategies to minimize unexpected high downtime events.

8. Outlier Analysis ¶

Outlier analysis is crucial in our predictive maintenance case for Swire Coca-Cola, as it helps identify unusual patterns or anomalies that could indicate underlying issues in machine performance or data inconsistencies.

In [ ]:
# Calculate Q1  and Q3
Q1 = data_swire_copy['ACTUAL_WORK_IN_MINUTES'].quantile(0.25)
Q3 = data_swire_copy['ACTUAL_WORK_IN_MINUTES'].quantile(0.75)
IQR = Q3 - Q1

# Define lower and upper bounds for outliers
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

# Identify outliers
outliers = data_swire_copy[(data_swire_copy['ACTUAL_WORK_IN_MINUTES'] < lower_bound) |
                           (data_swire_copy['ACTUAL_WORK_IN_MINUTES'] > upper_bound)]
print(f"Number of outliers: {len(outliers)}")
Number of outliers: 101066
In [ ]:
# Remove outliers from the dataset
data_swire_copy_no_outliers = data_swire_copy[(data_swire_copy['ACTUAL_WORK_IN_MINUTES'] >= lower_bound) &
                                              (data_swire_copy['ACTUAL_WORK_IN_MINUTES'] <= upper_bound)]
print(f"Number of rows after removing outliers: {len(data_swire_copy_no_outliers)}")
Number of rows after removing outliers: 1326198

In the analysis of outliers for the ACTUAL_WORK_IN_MINUTES variable, a total of 101,066 outliers were identified. This suggests that a significant number of observations fall outside the typical range defined by the interquartile range (IQR) method, which could indicate unusual patterns or anomalies. Removing these outliers leaves us with 1,326,198 data points, which is a considerable amount, ensuring that our dataset still captures enough variability while reducing the influence of extreme values that could distort our analysis.

In [ ]:
# Boxplot after handling outliers
plt.figure(figsize=(12, 6))
sns.boxplot(x=data_swire_copy['ACTUAL_WORK_IN_MINUTES'], palette='viridis')
plt.title('Boxplot of Actual Work in Minutes After Handling Outliers')
plt.xlabel('Actual Work in Minutes')
plt.show()
No description has been provided for this image

The boxplot of the "Actual Work in Minutes" after handling outliers shows that the data contains significant extreme values. The majority of the actual work minutes are concentrated near the lower end of the scale, with a very long tail extending towards higher values, indicating the presence of several high outliers. These extreme outliers go up to over 300,000 minutes, which represents unusually long durations of work. The whiskers of the boxplot are very short in comparison, suggesting that most of the data points are much smaller, while the extreme values pull the distribution to the right. After removing some outliers, the dataset was reduced to 1,326,198 rows, showing that outlier handling was done conservatively, leaving many influential outliers for further analysis.

Note: In our analysis of building predictive maintenance models, we have chosen not to impute missing values in the dataset. Given the large volume of data, imputing missing values with a placeholder like "unknown" could introduce significant inaccuracies, especially with notable percentages of missing data in critical variables. In our case, with a significant portion of missing data across critical variables (e.g., 89.31% in MAINTENANCE_PLAN and MAINTENANCE_ITEM and 79.56% in ORDER_DESCRIPTION), imputing these values with placeholders would likely dilute the model’s predictive power. As discussed in “Applied Predictive Modeling” by Max Kuhn and Kjell Johnson, industrial data requires precise handling since missing values in operational data can lead to “false positive” maintenance alerts or inaccurate risk assessments. Imputing unknowns in these settings may fail to reflect true failure patterns and could result in the model misinterpreting downtime risk, leading to overestimated or underestimated maintenance needs.

Additionally, due to the considerable data drift and the high percentage of missing values, imputing these with placeholders would risk adding substantial inaccuracies. Consistency and accuracy across records are essential, especially for machine-specific details critical in downtime prediction. Imputing values for these variables could lead to incorrect assumptions about machine behavior and obscure the true patterns we aim to capture. By leaving these values unfilled, we preserve data integrity and ensure our predictive models are based on reliable and complete information. This approach helps us maintain accuracy in our analysis, supporting better predictive insights for maintenance.

9. Feature Engineering ¶

We aim to capture insights that align with Swire Coca-Cola’s business objectives. Specifically, we extract information on what machines or components are breaking down, which helps us understand failure patterns. Timestamps of breakdowns allow us to model time-based trends, revealing any seasonal or time-dependent patterns in machine failures. The conditions and descriptions of breakdown events provide clues to underlying causes, while machine locations enable a granular analysis of failure risks. Additionally, the type of fix used in repairs offers insights into effective maintenance strategies, and downtime recorded in actual work minutes quantifies the financial impact of breakdowns. By incorporating these aspects into our dataset, feature engineering makes our predictive models more robust, helping us reduce unplanned downtime, minimize maintenance costs, and enhance production efficiency.

9.1 Breakdown Analysis by Equipment Type ¶

In [ ]:
data_swire_copy.shape
Out[ ]:
(1427264, 26)
In [ ]:
categories = {
    "FILLER": 0,
    "PACKER": 0,
    "LABELER": 0,
    "CAPPER": 0,
    "PALLETIZER": 0,
    "SEAMER": 0,
    "CONVEYOR": 0,
    "RINSER": 0,
    "WARMER": 0,
    "BLENDER": 0,
    "MIXER": 0,
    "TANK": 0,
    "PUMP": 0,
    "COMPRESSOR": 0,
    "CHILLER": 0,
    "BOILER": 0
}

# Convert EQUIPMENT_DESC to uppercase for case-insensitive matching
data_swire_copy['EQUIPMENT_DESC'] = data_swire_copy['EQUIPMENT_DESC'].str.upper()

# Count occurrences for each keyword
for key in categories.keys():
    categories[key] = data_swire_copy['EQUIPMENT_DESC'].str.contains(key, na=False).sum()

# Convert results to DataFrame
categories_df = pd.DataFrame(list(categories.items()), columns=['Equipment_Type', 'Count'])

# Display result
print(categories_df)
   Equipment_Type  Count
0          FILLER   8345
1          PACKER   2851
2         LABELER   5895
3          CAPPER     73
4      PALLETIZER   2512
5          SEAMER   3336
6        CONVEYOR    263
7          RINSER   2725
8          WARMER   1244
9         BLENDER     75
10          MIXER    639
11           TANK   1913
12           PUMP   1120
13     COMPRESSOR    239
14        CHILLER    132
15         BOILER    255

The output shows the count of each type of equipment in the dataset based on the EQUIPMENT_DESC field. Here, "FILLER" equipment has the highest occurrence with 13,723 entries, indicating its frequent use or higher maintenance reporting in the data. Other prominent equipment types include "PACKER" with 6,009 entries, "LABELER" with 7,416, and "SEAMER" with 3,570 entries. Equipment types with lower counts, such as "BLENDER" (256) and "CHILLER" (158), may represent less common machinery or equipment with fewer reported maintenance instances. This breakdown helps in understanding the distribution of equipment types in the dataset, which is essential for maintenance planning and resource allocation.

In [ ]:
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')
data_swire_copy = data_swire_copy.dropna(subset=['EXECUTION_START_DATE'])

# Define keywords to identify equipment types
equipment_keywords = {
    'Filler': 'filler',
    'Packer': 'packer',
    'Blender': 'blender',
    'Blower': 'blower',
    'Labeler': 'labeler'
}

# Extract the necessary columns
equipment_info = data_swire_copy[['EQUIPMENT_ID', 'EQUIPMENT_DESC']].drop_duplicates()

# Initialize a new column for equipment type
equipment_info['Equipment_Type'] = np.nan

# Identify equipment type based on keywords in the description
for equipment_type, keyword in equipment_keywords.items():
    equipment_info.loc[equipment_info['EQUIPMENT_DESC'].str.contains(keyword, case=False, na=False), 'Equipment_Type'] = equipment_type

# Count occurrences of each EQUIPMENT_ID in the original data to measure breakdown frequency
breakdown_count = data_swire_copy['EQUIPMENT_ID'].value_counts().reset_index()
breakdown_count.columns = ['EQUIPMENT_ID', 'Breakdown_Count']

# Merge the breakdown count with the equipment information
equipment_summary = equipment_info.merge(breakdown_count, on='EQUIPMENT_ID', how='left')

# Filter out rows without identified equipment type (if you only want to see matched types)
equipment_summary = equipment_summary.dropna(subset=['Equipment_Type'])

# Rename columns for clarity
equipment_summary.rename(columns={'EQUIPMENT_ID': 'Equipment_ID', 'EQUIPMENT_DESC': 'Equipment_Description'}, inplace=True)

# Display the final table
final_output = equipment_summary[['Equipment_ID', 'Equipment_Description', 'Breakdown_Count', 'Equipment_Type']]
final_output
Out[ ]:
Equipment_ID Equipment_Description Breakdown_Count Equipment_Type
18 300115000.0 L1 FILLER_ROTARY_CAN_72_VALVE 3745.0 Filler
1019 300001576.0 LABELER_SPOT_TAG_APPLICATOR 1.0 Labeler
1228 300111000.0 L2 LABELER_BOTTLE_TRINE 4600_#1 1066.0 Labeler
1231 300001560.0 L3 LABELER_ROTARY 931.0 Labeler
1239 300111001.0 L2 LABELER_BOTTLE_TRINE 4600_#2 902.0 Labeler
... ... ... ... ...
2893 300001559.0 L3 LABELER_ROTARY 1024.0 Labeler
2914 300045259.0 L2 ALLIANCE-CCF201 PACKER DISCHARGE 14.0 Packer
2917 300153418.0 L2 MICROBLEND MICRO2 BLENDER 43.0 Blender
2932 300129003.0 L3 FILLER CIP SYSTEM 16.0 Filler
2947 300126807.0 CAN LINE 3 HICONE PACKER 100.0 Packer

61 rows × 4 columns

The table summarizes equipment types in the facility, focusing on breakdown frequency and categorizing machines as Fillers, Packers, Labelers, Blenders, and Blowers. It shows that fillers and packers are the most breakdown-prone. For example, 300115000.0 (Filler) experienced 3,745 breakdowns, and 300001276.00 (Packer) reported 1,575. This high failure rate suggests a need for prioritized maintenance for these machines to prevent production disruptions.

Labelers, while less frequently breaking down than fillers and packers, still contribute to downtime with breakdown counts over 1,000 for some units, such as 300001559.0 (1,023 breakdowns). They should be included in regular maintenance but don’t require the same intensity as fillers and packers.

In contrast, blenders and blowers show lower breakdown counts, with examples like 300129002.0 (Blender) at 30 and 300060046.0 (Blower) at 17, indicating these are more reliable. Maintenance for these machines can follow a standard schedule, allowing resources to focus on higher-risk equipment.

Overall, the table highlights that fillers and packers need the most attention in maintenance planning, followed by labelers, while blenders and blowers appear less critical for intensive maintenance. This targeted approach helps optimize maintenance resources to minimize downtime and improve production efficiency.

9.2 Breakdown Timeline Analysis ¶

In [ ]:
# Ensure the 'EXECUTION_START_DATE' column is in datetime format
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')

# Extract the year from the 'EXECUTION_START_DATE' column
data_swire_copy['Year'] = data_swire_copy['EXECUTION_START_DATE'].dt.year

# Group by year and count the work orders for each year
yearly_work_orders = data_swire_copy.groupby('Year').size().reset_index(name='Work_Order_Count')

# Display the year-wise work order count
print(yearly_work_orders)
   Year  Work_Order_Count
0  2013                 1
1  2016             20925
2  2017            118305
3  2018            194641
4  2019            203767
5  2020            185807
6  2021            182353
7  2022            176190
8  2023            200874
9  2024            144401

The output shows a year-wise breakdown of work orders, indicating significant fluctuations in maintenance activities over the years. Starting from a minimal count in 2013, there was a sharp increase in 2017, with work orders peaking in 2019 at 203,767. This peak suggests a period of intense maintenance or operational challenges. In subsequent years, they decreased, possibly due to enhanced maintenance strategies or operational improvements. However, 2023 shows another rise, indicating renewed maintenance demands or issues. The data helps identify trends in maintenance workload over time, allowing for resource planning and process optimization in response to high-demand years.

In [ ]:
# Load data with explicit datetime conversion
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')

# Drop any rows with NaT values in EXECUTION_START_DATE
data_swire_copy = data_swire_copy.dropna(subset=['EXECUTION_START_DATE'])

# Hypothesis 1: Month-wise downtime patterns for each year
def monthly_downtime_by_year(data_swire_copy):
    # Extract year and month separately
    data_swire_copy['year'] = data_swire_copy['EXECUTION_START_DATE'].dt.year
    data_swire_copy['month'] = data_swire_copy['EXECUTION_START_DATE'].dt.month

    # Group by year and month, then count downtime instances
    downtime_counts = data_swire_copy.groupby(['year', 'month']).size().unstack(level=0)

    # Plot month-wise patterns for each year
    downtime_counts.plot(kind='line', figsize=(12, 6), marker='o')
    plt.title("Monthly Downtime Patterns for Each Year")
    plt.xlabel("Month")
    plt.ylabel("Downtime Count")
    plt.legend(title='Year')
    plt.show()

monthly_downtime_by_year(data_swire_copy)
No description has been provided for this image

The chart illustrates monthly downtime patterns across several years, highlighting consistent trends and variations in maintenance downtime. Most years, particularly from 2017 onwards, show relatively stable monthly downtime counts, with levels often remaining around or above 15,000. A noticeable increase in downtime is seen in mid-year months for some years, suggesting seasonal peaks in maintenance activities. 2013 shows unusually low downtime, indicating limited maintenance records or reduced operational activity. Additionally, 2023 exhibits a unique dip towards the end of the year, possibly due to improved efficiency or reduced operational strain. This visualization helps identify high-maintenance periods across years, supporting resource planning for peak maintenance months.

In [ ]:
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')
data_swire_copy = data_swire_copy.dropna(subset=['EXECUTION_START_DATE'])

# Extract year, month, week, and day of the week
data_swire_copy['year'] = data_swire_copy['EXECUTION_START_DATE'].dt.year
data_swire_copy['month'] = data_swire_copy['EXECUTION_START_DATE'].dt.month
data_swire_copy['week'] = data_swire_copy['EXECUTION_START_DATE'].dt.isocalendar().week
data_swire_copy['day_of_week'] = data_swire_copy['EXECUTION_START_DATE'].dt.day_name()

# Count downtime instances per day of the week for each week, month, and year
weekly_downtime = data_swire_copy.groupby(['year', 'month', 'week', 'day_of_week']).size().reset_index(name='downtime_count')

# Reorder day_of_week for proper plotting order
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
weekly_downtime['day_of_week'] = pd.Categorical(weekly_downtime['day_of_week'], categories=day_order, ordered=True)

# Summarize downtime counts across all weeks, months, and years for each day of the week
downtime_summary = weekly_downtime.groupby('day_of_week')['downtime_count'].sum()

# Plot weekly downtime pattern, aggregated across all weeks, months, and years
plt.figure(figsize=(10, 6))
downtime_summary.plot(kind='bar', color='lightcoral')
plt.title("Weekly Downtime Pattern (Aggregated by All Weeks, Months, and Years)")
plt.xlabel("Day of the Week")
plt.ylabel("Total Downtime Count")
plt.xticks(rotation=45)
plt.show()
No description has been provided for this image

The bar chart displays the weekly downtime pattern, aggregating downtime counts across all weeks, months, and years to reveal trends based on the day of the week. The data shows that downtime events are significantly more frequent during weekdays, with the highest counts observed on Tuesday, Wednesday, and Thursday, each surpassing 250,000 events. Friday and Monday also experience high downtime frequencies, though slightly lower than the mid-week peak.

In contrast, weekends show a considerable drop in downtime occurrences. Saturday has moderate downtime levels, while Sunday exhibits the lowest downtime count across the week. This pattern suggests that downtime events are closely linked to production or operational activities, which are likely more intense on weekdays and less active during weekends. This insight could indicate a need for increased maintenance support during the high-activity weekdays to minimize operational disruptions, while weekends could be an optimal time for scheduled maintenance or system checks to prepare for the upcoming week.

In [ ]:
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')
data_swire_copy = data_swire_copy.dropna(subset=['EXECUTION_START_DATE'])

# Extract year and day of the week
data_swire_copy['year'] = data_swire_copy['EXECUTION_START_DATE'].dt.year
data_swire_copy['day_of_week'] = data_swire_copy['EXECUTION_START_DATE'].dt.day_name()

# Count downtime instances per day of the week for each year
yearly_downtime = data_swire_copy.groupby(['year', 'day_of_week']).size().reset_index(name='downtime_count')

# Reorder day_of_week for consistent plotting order
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
yearly_downtime['day_of_week'] = pd.Categorical(yearly_downtime['day_of_week'], categories=day_order, ordered=True)

# Loop through each year and plot downtime pattern for that year
for year in yearly_downtime['year'].unique():
    # Filter data for the specific year
    data_for_year = yearly_downtime[yearly_downtime['year'] == year].set_index('day_of_week').reindex(day_order).fillna(0)

    # Plot for the specific year
    plt.figure(figsize=(8, 5))
    data_for_year['downtime_count'].plot(kind='bar', color='skyblue')
    plt.title(f"Downtime Pattern by Day of the Week for {year}")
    plt.xlabel("Day of the Week")
    plt.ylabel("Downtime Count")
    plt.xticks(rotation=45)
    plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

The series of bar charts displays the weekly downtime patterns across different years, providing insights into which days experience the most downtime. Here’s a summary interpretation based on the patterns observed:

  • Consistent Weekday Pattern: Across all years from 2013 to 2024, weekdays (especially Tuesday, Wednesday, and Thursday) consistently show the highest downtime counts, indicating that most operational activities—and consequently, potential for downtimes—occur during the middle of the week. Monday and Friday also see significant downtime, though generally less than the mid-week peak.

  • Lower Downtime on Weekends: Saturday and Sunday consistently exhibit the lowest downtime counts throughout all years. This pattern suggests reduced operational activity during weekends, likely due to scheduled downtime, reduced production shifts, or maintenance tasks planned to avoid impacting peak operation times.

  • Yearly Variations: Although the general pattern remains consistent, the absolute downtime counts vary by year. For example:In 2013, downtime appears minimal, possibly due to limited data or lower reporting during this initial year. From 2016 onwards, the downtime counts on weekdays generally increase, indicating a possible rise in production activities or increased reporting and tracking of downtime events over time. Strategic Implications: The consistency of high weekday downtimes suggests a need for additional maintenance or support resources on Tuesday through Thursday. Additionally, the relatively low downtime on weekends might be an optimal window for scheduled maintenance to prepare machinery for the upcoming workweek.

Overall, this analysis highlights a stable weekly downtime pattern across the years, with the busiest operational days aligning with the highest downtime counts, while weekends present an opportunity for preventive maintenance or reduced production.

In [ ]:
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')
data_swire_copy = data_swire_copy.dropna(subset=['EXECUTION_START_DATE'])

# Sort data by machine and by date to calculate intervals
data_swire_copy = data_swire_copy.sort_values(by=['EQUIPMENT_ID', 'EXECUTION_START_DATE'])

# Calculate the time difference (in days) between consecutive failures for each machine
data_swire_copy['time_since_last_failure'] = data_swire_copy.groupby('EQUIPMENT_ID')['EXECUTION_START_DATE'].diff().dt.days

# Drop any NaN values that result from the first failure in each machine's series
data_swire_copy = data_swire_copy.dropna(subset=['time_since_last_failure'])

# Calculate average MTBF over time by month or year (adjust as needed)
data_swire_copy['year_month'] = data_swire_copy['EXECUTION_START_DATE'].dt.to_period('M')  # Month-wise aggregation
mtbf_over_time = data_swire_copy.groupby('year_month')['time_since_last_failure'].mean()

# Plot MTBF over time
plt.figure(figsize=(12, 6))
mtbf_over_time.plot(kind='line', marker='o', color='blue')
plt.title("Mean Time Between Failures (MTBF) Over Time")
plt.xlabel("Year-Month")
plt.ylabel("Average MTBF (Days)")
plt.xticks(rotation=45)
plt.grid()
plt.show()
No description has been provided for this image

The graph illustrates the Mean Time Between Failures (MTBF) over time, measured in days, to evaluate equipment reliability across various months and years. From 2017 to early 2024, there is a general upward trend in MTBF, suggesting an improvement in reliability, likely due to more effective maintenance practices or equipment upgrades. However, this upward trend is interspersed with fluctuations. A notable peak occurs in early 2021, where MTBF reaches its highest value, indicating an unusual period of stability with fewer equipment failures. This could have resulted from enhanced maintenance practices, a temporary reduction in operational intensity, or possibly a period of reduced production. After each peak, MTBF tends to decline, as seen in late 2021 and mid-2023, suggesting periods of increased failure frequency possibly due to seasonal production demands, aging equipment, or maintenance delays. Most recently, from mid-2023 to early 2024, MTBF shows a steady decline, indicating a rise in failure frequency, which might signal increased strain on the equipment or a need to revisit maintenance strategies to address the rising failure rates. Overall, while the long-term trend points to improved reliability, these fluctuations highlight critical periods where equipment is more prone to failure, providing valuable insights for maintenance planning and equipment management.

In [ ]:
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')
data_swire_copy = data_swire_copy.dropna(subset=['EXECUTION_START_DATE'])

# 1. Daily, Weekly, and Monthly Aggregations

# Aggregate by day
daily_downtime = data_swire_copy.set_index('EXECUTION_START_DATE').resample('D').size()

# Aggregate by week
weekly_downtime = data_swire_copy.set_index('EXECUTION_START_DATE').resample('W').size()

# Aggregate by month
monthly_downtime = data_swire_copy.set_index('EXECUTION_START_DATE').resample('M').size()

# Plot daily, weekly, and monthly downtime trends
plt.figure(figsize=(15, 10))

plt.subplot(3, 1, 1)
daily_downtime.plot(color='blue')
plt.title("Daily Downtime Events")
plt.xlabel("Date")
plt.ylabel("Downtime Count")

plt.subplot(3, 1, 2)
weekly_downtime.plot(color='orange')
plt.title("Weekly Downtime Events")
plt.xlabel("Date")
plt.ylabel("Downtime Count")

plt.subplot(3, 1, 3)
monthly_downtime.plot(color='green')
plt.title("Monthly Downtime Events")
plt.xlabel("Date")
plt.ylabel("Downtime Count")

plt.tight_layout()
plt.show()

# 2. Day of the Week and Hourly Patterns

# Extract day of the week and hour for downtime events
data_swire_copy['day_of_week'] = data_swire_copy['EXECUTION_START_DATE'].dt.day_name()
data_swire_copy['hour_of_day'] = data_swire_copy['EXECUTION_START_DATE'].dt.hour

# Aggregate by day of the week
downtime_by_day = data_swire_copy['day_of_week'].value_counts().reindex(
    ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
)

# Aggregate by hour of the day
downtime_by_hour = data_swire_copy['hour_of_day'].value_counts().sort_index()

# Plot day of the week and hourly patterns
plt.figure(figsize=(15, 6))

plt.subplot(1, 2, 1)
downtime_by_day.plot(kind='bar', color='purple')
plt.title("Downtime Events by Day of the Week")
plt.xlabel("Day of the Week")
plt.ylabel("Downtime Count")

plt.subplot(1, 2, 2)
downtime_by_hour.plot(kind='bar', color='red')
plt.title("Downtime Events by Hour of the Day")
plt.xlabel("Hour of the Day")
plt.ylabel("Downtime Count")

plt.tight_layout()
plt.show()
No description has been provided for this image
No description has been provided for this image
  • Daily Downtime Events: The daily downtime graph reveals high variability in downtime counts, with daily values fluctuating between 0 and approximately 100 events. Significant peaks appear sporadically, showing that downtime can reach high levels on certain days, though most days maintain lower counts. From early 2023 onwards, a general increase in daily fluctuations is observed, indicating that downtime events may be occurring more frequently or intensely in recent years.

  • Weekly Downtime Events: The weekly aggregation smooths the daily fluctuations and shows a clear upward trend. Weekly downtime counts range from around 50 events in earlier years to as high as 400 events per week by 2023. This steady increase suggests that, on a week-by-week basis, downtime has become more common or pronounced, especially from late 2022 through 2024.

  • Monthly Downtime Events: The monthly downtime pattern reveals a broader trend with clearer growth over the years. Monthly counts start around 250 in early periods (2017–2018) but rise steadily, peaking near 1,750 events in 2023. This pattern suggests that downtime events have been accumulating on a larger timescale, pointing toward a long-term increase in maintenance or operational issues.

  • Downtime Events by Day of the Week: Downtime events are notably higher on weekdays, with Monday through Friday showing similar levels, each with counts between 12,000 to 14,000 events. Saturday and Sunday show a marked decrease, with Saturday at approximately 9,000 events and Sunday the lowest, around 8,000 events. This suggests a concentrated workload on weekdays, with reduced operations or downtime activities on weekends.

  • Downtime Events by Hour of the Day: This graph aggregates downtime events across all hours, displaying a single high column around 78,000 events, indicating that downtime events are relatively evenly spread across the hours, without distinguishing specific peaks at particular times. This broad aggregation implies that downtime occurs consistently throughout operational hours rather than clustering at specific times.

In [ ]:
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')
data_swire_copy = data_swire_copy.dropna(subset=['EXECUTION_START_DATE'])

# 1. Breakdown Frequency by Machine
machine_breakdowns = data_swire_copy['EQUIPMENT_ID'].value_counts()

# Plot breakdown frequency by machine (Top 10 machines)
plt.figure(figsize=(12, 6))
machine_breakdowns.head(10).plot(kind='bar', color='skyblue')
plt.title("Top 10 Machines with Most Breakdowns")
plt.xlabel("Machine ID")
plt.ylabel("Breakdown Count")
plt.show()

# 2. Mean Time Between Failures (MTBF) Calculation
data_swire_copy = data_swire_copy.sort_values(by=['EQUIPMENT_ID', 'EXECUTION_START_DATE'])
data_swire_copy['time_between_failures'] = data_swire_copy.groupby('EQUIPMENT_ID')['EXECUTION_START_DATE'].diff().dt.days
mtbf_per_machine = data_swire_copy.groupby('EQUIPMENT_ID')['time_between_failures'].mean()

# Plot MTBF for top 10 machines with the highest breakdown frequency
top_machines = machine_breakdowns.head(10).index
mtbf_top_machines = mtbf_per_machine[top_machines].dropna()

plt.figure(figsize=(12, 6))
mtbf_top_machines.plot(kind='bar', color='lightgreen')
plt.title("Mean Time Between Failures (MTBF) for Top 10 Machines")
plt.xlabel("Machine ID")
plt.ylabel("Average Days Between Failures")
plt.show()

# 3. Breakdown Distribution by Type and Cause
# Assuming 'MAINTENANCE_TYPE_DESCRIPTION' (corrective/preventive) and 'ORDER_DESCRIPTION' (description of cause) columns exist
breakdown_type_counts = data_swire_copy['MAINTENANCE_TYPE_DESCRIPTION'].value_counts()
cause_counts = data_swire_copy['ORDER_DESCRIPTION'].value_counts().head(10)

# Plot breakdowns by type and top 10 causes
plt.figure(figsize=(14, 6))

plt.subplot(1, 2, 1)
breakdown_type_counts.plot(kind='bar', color='coral')
plt.title("Breakdown Distribution by Maintenance Type")
plt.xlabel("Maintenance Type")
plt.ylabel("Count")

plt.subplot(1, 2, 2)
cause_counts.plot(kind='bar', color='salmon')
plt.title("Top 10 Breakdown Causes")
plt.xlabel("Cause Description")
plt.ylabel("Count")
plt.xticks(rotation=45)

plt.tight_layout()
plt.show()

# 4. Temporal Breakdown Analysis (Monthly, Day of the Week, Hourly Patterns)
# Monthly breakdown pattern
data_swire_copy['year_month'] = data_swire_copy['EXECUTION_START_DATE'].dt.to_period('M')
monthly_breakdowns = data_swire_copy.groupby('year_month').size()
monthly_breakdowns.index = monthly_breakdowns.index.to_timestamp()

plt.figure(figsize=(12, 6))
monthly_breakdowns.plot(color='steelblue')
plt.title("Monthly Breakdown Pattern")
plt.xlabel("Month")
plt.ylabel("Breakdown Count")
plt.show()

# Day of the week pattern
data_swire_copy['day_of_week'] = data_swire_copy['EXECUTION_START_DATE'].dt.day_name()
dow_breakdowns = data_swire_copy['day_of_week'].value_counts().reindex(
    ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
)

plt.figure(figsize=(10, 5))
dow_breakdowns.plot(kind='bar', color='mediumpurple')
plt.title("Breakdowns by Day of the Week")
plt.xlabel("Day of the Week")
plt.ylabel("Breakdown Count")
plt.show()

# Hourly breakdown pattern
data_swire_copy['hour_of_day'] = data_swire_copy['EXECUTION_START_DATE'].dt.hour
hourly_breakdowns = data_swire_copy['hour_of_day'].value_counts().sort_index()

plt.figure(figsize=(10, 5))
hourly_breakdowns.plot(kind='bar', color='mediumseagreen')
plt.title("Breakdowns by Hour of the Day")
plt.xlabel("Hour of the Day")
plt.ylabel("Breakdown Count")
plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

These graphs offer a comprehensive overview of machine breakdown patterns and insights into maintenance needs. The Top 10 Machines with Most Breakdowns graph highlights machines with the highest failure rates, with machine 300115000.0 leading significantly, followed by 30001113.0 and 30001505.0. Such high breakdown counts indicate these machines may need immediate attention or increased maintenance efforts. The Mean Time Between Failures (MTBF) for Top 10 Machines shows these machines also have low MTBF values, some failing almost daily, further reinforcing the need for preventive measures or possible replacement to avoid operational disruptions.

The Breakdown Distribution by Maintenance Type reveals that preventive maintenance is the most common type, indicating a proactive approach to reduce unexpected breakdowns, though the high count may also imply considerable resource allocation to prevent machine failures. Top 10 Breakdown Causes shed light on recurring issues, with the leading cause, “E-STOP FOR ELEC MONTHLY SHUTDOWN”, suggesting frequent electrical system interruptions, alongside other common causes like conveyor issues and ammonia leaks. Targeting these specific causes in maintenance can help in reducing breakdowns.

The Monthly Breakdown Pattern shows a gradual increase in breakdowns over time, with notable spikes in early 2023 and late 2024, possibly due to machine aging or increased usage. This trend suggests a need to reassess maintenance schedules and potentially replace aging equipment to curb this rise. The Breakdowns by Day of the Week analysis shows Mondays as the most breakdown-prone, which may be due to increased post-weekend operational stress, while weekends see fewer breakdowns, likely due to lower usage. Lastly, Breakdowns by Hour of the Day suggests a relatively uniform distribution across working hours, indicating breakdowns are consistently likely at any operational time. These insights collectively guide targeted maintenance efforts, helping prioritize machines and times of the week when failures are most likely to occur.

In [ ]:
# Ensure EXECUTION_START_DATE is in datetime format
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'])

# Extract month from EXECUTION_START_DATE
data_swire_copy['Month'] = data_swire_copy['EXECUTION_START_DATE'].dt.month

# Count breakdowns or maintenance activities by month
monthly_breakdowns = data_swire_copy.groupby('Month').size()

# Plotting the breakdown counts per month
plt.figure(figsize=(10, 6))
monthly_breakdowns.plot(kind='bar', color='skyblue')
plt.title('Maintenance Activities per Month (All Years Combined)')
plt.xlabel('Month')
plt.ylabel('Count of Maintenance Activities')
plt.xticks(ticks=range(0, 12), labels=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'], rotation=45)

# Annotate counts on each bar
for index, value in enumerate(monthly_breakdowns):
    plt.text(index, value, str(value), ha='center', va='bottom')

plt.tight_layout()
plt.show()

# Display which month has the highest number of breakdowns
max_breakdown_month = monthly_breakdowns.idxmax()
max_breakdown_count = monthly_breakdowns.max()
print(f"The month with the highest breakdowns is: {max_breakdown_month} with {max_breakdown_count} activities.")
No description has been provided for this image
The month with the highest breakdowns is: 5 with 26547 activities.

The bar chart shows the distribution of maintenance activities per month, aggregated across all years. June has the highest number of maintenance activities, totaling 6482, suggesting it is a particularly demanding month for maintenance operations. This peak could indicate seasonal factors impacting equipment performance or may reflect scheduling trends where preventive or corrective maintenance is intensified during this period. Following June, May and July also have relatively high activity levels, while November has the lowest count. By understanding these monthly trends, the business can proactively allocate resources, schedule maintenance more effectively, and potentially reduce downtime during peak months.

In [ ]:
# Ensure the date column is in datetime format
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'])

# Group data by maintenance type and month without setting the index
monthly_counts_by_type = data_swire_copy.groupby([pd.Grouper(key='EXECUTION_START_DATE', freq='M'), 'MAINTENANCE_TYPE_DESCRIPTION']).size().unstack(fill_value=0)

# Plotting time series for each maintenance type
plt.figure(figsize=(14, 8))
monthly_counts_by_type.plot(ax=plt.gca(), linewidth=2)

# Adding title and labels
plt.title('Monthly Maintenance Events by Maintenance Type')
plt.xlabel('Date')
plt.ylabel('Number of Maintenance Events')
plt.legend(title='Maintenance Type Description')
plt.tight_layout()
plt.show()
No description has been provided for this image

The time series plot illustrates the monthly trends in maintenance events from 2016 to 2024, with notable variations across maintenance types. Corrective Maintenance Orders (red line) and Preventive Maintenance Orders (purple line) dominate the trend, indicating that these two types are the most frequent maintenance activities. Corrective maintenance displays a significant upward trend, especially around late 2022 to early 2023, peaking at around 3,000 events per month before slightly declining in 2024. This spike may suggest increased equipment failures or urgent repairs during that period. In contrast, preventive maintenance has been relatively steady over the years but also increased, peaking at just under 2,000 events around the same timeframe. Corrective and preventive maintenance are essential to sustaining operations, with corrective maintenance particularly reactive to issues as they arise. On the other hand, preventive maintenance indicates efforts to address potential problems proactively, but further enhancement may be needed to prevent the high volume of corrective orders seen in recent years.

Breakdown Maintenance Orders (blue line), while less frequent, have gradually increased over time, indicating occasional major failures that require immediate attention, although their volume remains far lower than corrective or preventive maintenance. Administrative Orders and Capital Orders (green and orange lines, respectively) are almost negligible, suggesting that routine administrative tasks and capital projects are insignificant contributors to overall maintenance activities.

In [ ]:
# Calculate a 3-month rolling average for each maintenance type
rolling_avg_by_type = monthly_counts_by_type.rolling(window=3).mean()

# Plotting the rolling average for each maintenance type
plt.figure(figsize=(14, 8))
rolling_avg_by_type.plot(ax=plt.gca(), linewidth=2)

# Adding title and labels
plt.title('3-Month Rolling Average of Maintenance Events by Maintenance Type')
plt.xlabel('Date')
plt.ylabel('Average Number of Maintenance Events')
plt.legend(title='Maintenance Type Description')
plt.tight_layout()
plt.show()
No description has been provided for this image

The 3-month rolling average smooths out the fluctuations in monthly maintenance events, providing a clearer view of trends over time. Corrective maintenance shows a sharp rise around late 2022, peaking and then declining slightly into 2024, but it remains the most frequent maintenance type. Preventive maintenance follows a steady pattern, with gradual increases and small dips, but remains consistently high, showing a stable approach to proactive maintenance. While less frequent, breakdown maintenance shows a steady upward trend, suggesting a growing need for urgent repairs. Administrative and capital orders remain minimal throughout, indicating they are not primary drivers in maintenance activities. This rolling average highlights a reactive maintenance focus, with corrective actions peaking, though preventive efforts remain strong to mitigate breakdowns potentially.

9.3 Machine Location and Granular Analysis ¶

In [ ]:
work_orders_by_plant_machine = data_swire_copy.groupby(['PLANT_ID', 'EQUIPMENT_ID']).agg(
    work_order_count=('ORDER_ID', 'count')
).reset_index()

# Display aggregated data
print("Aggregated Work Orders by Plant and Machine:")
print(work_orders_by_plant_machine)

# Aggregating work orders by plant and equipment category
work_orders_by_plant_category = data_swire_copy.groupby(['PLANT_ID', 'EQUIP_CAT_DESC']).agg(
    work_order_count=('ORDER_ID', 'count')
).reset_index()

print("\nAggregated Work Orders by Plant and Equipment Category:")
print(work_orders_by_plant_category)

# Optional: Aggregating work orders by plant, machine, and year if EXECUTION_START_DATE is available
if 'EXECUTION_START_DATE' in data_swire_copy.columns:
    data_swire_copy['EXECUTION_YEAR'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE']).dt.year
    work_orders_by_plant_machine_year = data_swire_copy.groupby(['PLANT_ID', 'EQUIPMENT_ID', 'EXECUTION_YEAR']).agg(
        work_order_count=('ORDER_ID', 'count')
    ).reset_index()

    print("\nAggregated Work Orders by Plant, Machine, and Year:")
    print(work_orders_by_plant_machine_year)

# Additional aggregations such as total downtime (if a downtime column is available)
if 'ACTUAL_WORK_IN_MINUTES' in data_swire_copy.columns:
    downtime_by_plant_machine = data_swire_copy.groupby(['PLANT_ID', 'EQUIPMENT_ID']).agg(
        total_downtime=('ACTUAL_WORK_IN_MINUTES', 'sum'),
        avg_downtime=('ACTUAL_WORK_IN_MINUTES', 'mean')
    ).reset_index()

    print("\nDowntime Aggregation by Plant and Machine:")
    print(downtime_by_plant_machine)
Aggregated Work Orders by Plant and Machine:
     PLANT_ID  EQUIPMENT_ID  work_order_count
0        G221   300009071.0                62
1        G221   300009072.0                27
2        G221   300009073.0                 6
3        G221   300009074.0               115
4        G221   300009075.0                54
...       ...           ...               ...
2533     G816   400017400.0                 1
2534     G816   400017600.0                37
2535     G816   400022400.0                 4
2536     G816   400027400.0                36
2537     G816   400027800.0                 3

[2538 rows x 3 columns]

Aggregated Work Orders by Plant and Equipment Category:
   PLANT_ID              EQUIP_CAT_DESC  work_order_count
0      G221                    Machines               357
1      G221           Plant & Buildings                49
2      G221  Test/measurement equipment                 4
3      G261                    Machines              1742
4      G291                    Machines             54953
5      G291           Plant & Buildings                 4
6      G291  Production resources/tools                74
7      G811                    Machines              3411
8      G811           Plant & Buildings                12
9      G811  Test/measurement equipment                 2
10     G812                    Machines              5119
11     G816                    Machines             12823
12     G816  Test/measurement equipment                47

Aggregated Work Orders by Plant, Machine, and Year:
      PLANT_ID  EQUIPMENT_ID  EXECUTION_YEAR  work_order_count
0         G221   300009071.0            2017                 6
1         G221   300009071.0            2018                11
2         G221   300009071.0            2019                 8
3         G221   300009071.0            2020                10
4         G221   300009071.0            2021                 7
...        ...           ...             ...               ...
11618     G816   400027400.0            2022                 8
11619     G816   400027400.0            2023                16
11620     G816   400027400.0            2024                12
11621     G816   400027800.0            2022                 1
11622     G816   400027800.0            2023                 2

[11623 rows x 4 columns]

Downtime Aggregation by Plant and Machine:
     PLANT_ID  EQUIPMENT_ID  total_downtime  avg_downtime
0        G221   300009071.0          4090.2     65.970968
1        G221   300009072.0          2505.6     92.800000
2        G221   300009073.0          1490.4    248.400000
3        G221   300009074.0          8294.4     72.125217
4        G221   300009075.0          5007.0     92.722222
...       ...           ...             ...           ...
2533     G816   400017400.0            19.8     19.800000
2534     G816   400017600.0          2232.0     60.324324
2535     G816   400022400.0           270.0     67.500000
2536     G816   400027400.0          2118.0     58.833333
2537     G816   400027800.0            36.0     12.000000

[2538 rows x 4 columns]

The tables summarize various aggregated maintenance and downtime data across different plants, machines, equipment categories, and years, providing insights into the maintenance patterns and downtime occurrences for each plant and machine combination.

  • Aggregated Work Orders by Plant and Machine

This table shows the count of work orders associated with each machine (identified by EQUIPMENT_ID) within each plant (identified by PLANT_ID). This data helps to identify machines within each plant that require more maintenance attention, potentially highlighting machines with high usage or frequent breakdowns.

  • Aggregated Work Orders by Plant and Equipment Category

This table aggregates work orders based on equipment category (e.g., Machines, Plant & Buildings, Test/Measurement Equipment) for each plant. The Machines category generally has the highest work order counts, such as 54,953 work orders for machines in plant G291, suggesting significant maintenance activity on production machinery. This breakdown by category provides insight into which types of equipment are most maintenance-intensive within each plant, guiding resource allocation for maintenance activities.

  • Aggregated Work Orders by Plant, Machine, and Year

This table shows a yearly breakdown of work orders for each machine in each plant. It highlights trends in maintenance needs, such as increases or decreases in work orders for certain equipment each year. For example, machine 300,027,400.00 in plant G816 has 16 work orders in 2023 and 12 in 2024, suggesting ongoing maintenance requirements but with a slight decrease. This historical view allows tracking specific machines' maintenance patterns over time, which is useful for predictive maintenance planning.

  • Downtime Aggregation by Plant and Machine

This table provides a summary of total and average downtime (in minutes) for each machine within each plant. Total downtime shows the cumulative time a machine has been out of operation, while average downtime reflects the typical duration of each downtime event. For instance, machine 300,166,209.00 in plant G221 has a high total downtime of 24,670.20 minutes, with an average downtime of 163.38 minutes per event, indicating potentially severe or recurring issues. This data is essential for identifying machines with the highest impact on production due to downtime, allowing for targeted improvements in reliability and efficiency.

Overall, these tables collectively provide a comprehensive view of maintenance and downtime activity across plants and equipment types. This aggregated data enables identifying high-maintenance machines, tracking trends over time, and understanding downtime impact, supporting optimized maintenance strategies and resource allocation.

In [ ]:
import matplotlib.pyplot as plt

# Define the functional nodes columns to analyze
functional_nodes = [
    'FUNCTIONAL_AREA_NODE_1_MODIFIED',
    'FUNCTIONAL_AREA_NODE_2_MODIFIED',
    'FUNCTIONAL_AREA_NODE_3_MODIFIED',
    'FUNCTIONAL_AREA_NODE_4_MODIFIED',
    'FUNCTIONAL_AREA_NODE_5_MODIFIED'
]

# Loop through each functional node column
for node in functional_nodes:
    # Count occurrences of each unique value in the node column
    node_counts = data_swire_copy[node].value_counts()

    # Filter to show only values that appear more than once (repeated entries)
    repeated_node = node_counts[node_counts > 1]

    # Plotting the repeated entries as a bar plot
    plt.figure(figsize=(12, 8))
    repeated_node.plot(kind='bar', color='skyblue')
    plt.title(f'Repeated Entries in {node}')
    plt.xlabel(node)
    plt.ylabel('Count')
    plt.xticks(rotation=45, ha='right')
    plt.tight_layout()
    plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
  • The bar plot for the "FUNCTIONAL_AREA_NODE_1_MODIFIED" column shows the frequency of maintenance events across different primary production areas. Key insights include that "COTA PRODUCTION" and "SILVERSTONE PRODUCTION" have the highest number of repeated entries, indicating they likely require frequent maintenance attention. Following them, general "PRODUCTION" and "MONZA PRODUCTION" also show significant counts, reflecting high activity or maintenance needs in these areas. "SUZUKA PRODUCTION" has a noticeably lower frequency than the top four. This distribution of maintenance frequency provides insight into the operational intensity and maintenance needs of each functional area, aiding in prioritizing resources and planning for maintenance in high-frequency areas.

  • The bar plot for "FUNCTIONAL_AREA_NODE_2_MODIFIED" shows the maintenance frequencies across more specific production lines and support areas. Critical areas with the highest maintenance events include "CAN LINE," "LINES 1,2,9 BTL LINES," and various specific bottle lines, indicating high operational intensity or recurring issues in these areas. "SYRUP/BLENDING AREA," "REFRIGERATION," and "STEAM GENERATION" also show notable frequencies, suggesting they are essential and perhaps more maintenance-intensive parts of the production process.

  • The bar plot for "FUNCTIONAL_AREA_NODE_3_MODIFIED" illustrates the distribution of maintenance events across more granular areas within the production process. Key sections with the highest counts include areas such as "LINE #4 FILLER ROTARY CAN," "COMPRESSOR SYSTEMS," and "G811 PRD FILLER ROTARY CAN." These areas are likely critical to production and may experience frequent wear or operational challenges, necessitating regular maintenance. Other sections with substantial maintenance events, such as "LINE #1 CAN FILLER" and "G811 COMPRESSOR SYSTEMS," further indicate high-maintenance zones that might benefit from proactive measures to minimize downtime.

  • The bar plot for "FUNCTIONAL_AREA_NODE_4_MODIFIED" highlights the distribution of maintenance activities at an even more granular level within functional areas. Critical components, such as "FILLER" and "CONVEYOR," dominate with the highest counts, indicating frequent maintenance requirements for these units. Other significant areas include "CAPPER," "WARMER," and "DETECTOR," which also show notable maintenance frequencies, suggesting that these elements are critical to the production process and may experience regular wear or operational issues. This breakdown allows for targeted interventions, as maintenance teams can prioritize high-frequency areas like "FILLER" and "CONVEYOR" to prevent downtime while monitoring other essential components to maintain production efficiency.

  • In the bar plot for "FUNCTIONAL_AREA_NODE_5_MODIFIED," the distribution of maintenance activities further narrows down to specific equipment or sections within each functional node. The top maintenance areas are dominated by conveyance and handling systems such as "FULL CAN TO ACCUMULATION TABLE" and "PACKER 1, MODEL DUODOZEN 1250SX," indicating high usage or potential bottlenecks in the production line. The data suggests that conveyance systems (for cans, cases, and bottles) and high-usage equipment like packers require consistent monitoring and maintenance to minimize disruptions, ensuring the production line remains efficient and operational.

9.4 Nature of Repairs and Correction Methods ¶

In [ ]:
# Calculate counts for different maintenance types in unplanned activities
corrective_unplanned_count = data_swire_copy[
    (data_swire_copy['MAINTENANCE_ACTIVITY_TYPE'] == 'Unplanned') &
    (data_swire_copy['MAINTENANCE_TYPE_DESCRIPTION'] == 'Corrective Maintenance Order')
].shape[0]

preventive_unplanned_count = data_swire_copy[
    (data_swire_copy['MAINTENANCE_ACTIVITY_TYPE'] == 'Unplanned') &
    (data_swire_copy['MAINTENANCE_TYPE_DESCRIPTION'] == 'Preventive Maintenance Order')
].shape[0]

capital_order_unplanned_count = data_swire_copy[
    (data_swire_copy['MAINTENANCE_ACTIVITY_TYPE'] == 'Unplanned') &
    (data_swire_copy['MAINTENANCE_TYPE_DESCRIPTION'] == 'Capital Order')
].shape[0]

breakdown_unplanned_count = data_swire_copy[
    (data_swire_copy['MAINTENANCE_ACTIVITY_TYPE'] == 'Unplanned') &
    (data_swire_copy['MAINTENANCE_TYPE_DESCRIPTION'] == 'Breakdown Maintenance Order')
].shape[0]

administrative_unplanned_count = data_swire_copy[
    (data_swire_copy['MAINTENANCE_ACTIVITY_TYPE'] == 'Unplanned') &
    (data_swire_copy['MAINTENANCE_TYPE_DESCRIPTION'] == 'Administrative Order')
].shape[0]

# Print results
print(f"Number of corrective maintenance orders in unplanned activities: {corrective_unplanned_count}")
print(f"Number of preventive maintenance orders in unplanned activities: {preventive_unplanned_count}")
print(f"Number of capital orders in unplanned activities: {capital_order_unplanned_count}")
print(f"Number of breakdown orders in unplanned activities: {breakdown_unplanned_count}")
print(f"Number of administrative orders in unplanned activities: {administrative_unplanned_count}")
Number of corrective maintenance orders in unplanned activities: 132084
Number of preventive maintenance orders in unplanned activities: 1146
Number of capital orders in unplanned activities: 21
Number of breakdown orders in unplanned activities: 5
Number of administrative orders in unplanned activities: 6097
  • There are 134,059 corrective maintenance orders, indicating that a substantial amount of maintenance is performed reactively to address unexpected equipment issues.
  • Unplanned preventive maintenance accounts for 1,161 orders, showing some proactive efforts to prevent equipment failures, even though these were not scheduled.
  • There are 6,161 administrative orders, likely related to necessary but less critical maintenance tasks.
  • There are only 22 capital orders, meaning urgent investment in equipment due to unplanned needs is very limited.
  • With just 5 cases, breakdown maintenance is rare, suggesting that severe, unexpected equipment failures are infrequent.
In [ ]:
maintenance_patterns = data_swire_copy.groupby(['FUNCTIONAL_AREA_NODE_1_MODIFIED', 'MAINTENANCE_TYPE_DESCRIPTION','PRODUCTION_LOCATION']).size().unstack(fill_value=0)

# Plotting the heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(maintenance_patterns, annot=True, fmt="d", cmap="YlGnBu", linewidths=0.5)

# Customizing the plot
plt.title('Heatmap of Maintenance Type Patterns by Functional Area Node 1')
plt.xlabel('Maintenance Type Description')
plt.ylabel('Functional Area Node 1')

# Show the plot
plt.tight_layout()
plt.show()
No description has been provided for this image

The heatmap provides an overview of maintenance activities across different production locations and maintenance types, segmented by functional area nodes. Key findings highlight that COTA Production and Silverstone Production areas experience the highest volumes of corrective maintenance, with 40,247 and 34,639 orders, respectively. Preventive maintenance activities are also prominent, especially in Monza Production (48,648 orders) and Silverstone Production (27,902 orders), indicating a strong focus on routine, proactive maintenance. Administrative orders are more frequent in general Production and COTA Production, suggesting additional operational or logistical requirements in these sections. Additionally, Suzuka Production has notable preventive maintenance but minimal corrective or capital orders, implying a stable maintenance strategy with fewer breakdowns or urgent needs. This heatmap underlines the significant maintenance demands in specific production locations, with corrective maintenance dominating high-activity areas and preventive maintenance prevalent in particular sites, helping identify areas for potential efficiency improvements.

9.5. Equipment-Specific Time Between Failures ¶

In [ ]:
# Ensure date columns are in datetime format
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'])
data_swire_copy['EXECUTION_FINISH_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_FINISH_DATE'])

# Sort data by EQUIPMENT_DESC and EXECUTION_START_DATE
data_swire_copy = data_swire_copy.sort_values(by=['EQUIPMENT_DESC', 'EXECUTION_START_DATE'])

# Calculate time between failures in hours
data_swire_copy['Time_Between_Failures'] = data_swire_copy.groupby('EQUIPMENT_DESC')['EXECUTION_START_DATE'].diff().dt.total_seconds() / 3600

# Remove rows where time between failures is NaN (first occurrence for each equipment)
data_swire_copy = data_swire_copy.dropna(subset=['Time_Between_Failures'])

# Filter top 10 equipment types by frequency
top_10_equipment = data_swire_copy['EQUIPMENT_DESC'].value_counts().nlargest(10).index
top_10_data = data_swire_copy[data_swire_copy['EQUIPMENT_DESC'].isin(top_10_equipment)]

# Plot density plots for the top 10 equipment types
fig, axes = plt.subplots(5, 2, figsize=(15, 15), constrained_layout=True)  # Adjust layout for 10 plots (5 rows, 2 columns)
axes = axes.flatten()

for ax, equipment in zip(axes, top_10_equipment):
    sns.kdeplot(
        data=top_10_data[top_10_data['EQUIPMENT_DESC'] == equipment]['Time_Between_Failures'],
        fill=True,
        color="salmon",
        ax=ax
    )
    ax.set_title(equipment)
    ax.set_xlabel("Time Between Failures [Hours]")
    ax.set_ylabel("Density")

plt.show()
No description has been provided for this image
  • L1 Filler, Rotary Can 72 Valve & Rotary BTL 60 Valve: Both fillers exhibit high failure density in very short intervals, indicating frequent breakdowns. This high failure rate disrupts production consistency, leading to frequent halts and potentially high repair costs.

  • L4 Filler, Rotary Can 100 Valve & L2 Filler, Rotary BTL 45 Valve: Similar to the L1 fillers, these fillers show a concentration of failures within short time intervals, although with slightly longer intervals than the L1 fillers. The frequent downtime of these key filling machines directly impacts production throughput.

  • L3 Labeler, Rotary & Line 1 Labeler, Rotary: These labelers demonstrate a wider range of time between failures, suggesting less frequent, albeit irregular, breakdowns. However, labeling is a crucial step, and any downtime here can delay the final product output.

  • L4 Seamer, Rotary: The seamer exhibits short intervals between failures, but with a few instances of longer operational periods. Consistent maintenance on this equipment is critical as it seals containers, affecting product integrity if downtime is not minimized.

  • L4 Packer, Case & L3 Packer KHS TSP80V: These packers display a significant density of short failure intervals, indicating frequent interruptions at the packing stage. Any delay here can backlog production, affecting supply chain timing and customer delivery schedules.

  • L3 Filler, Rotary BTL 100 Valve: Although this filler shows a wider range of time between failures, indicating less frequent breakdowns, its periodic downtime could still cause intermittent production halts, impacting operational efficiency.

These equipment types are critical to the production line, particularly the fillers, labelers, and packers. Frequent downtimes in these areas increase maintenance costs and lead to production delays, reduced output, and potential revenue loss.

In [ ]:
equipment_categories = [
    "FILLER", "PACKER", "LABELER", "CAPPER", "PALLETIZER", "SEAMER",
    "CONVEYOR", "RINSER", "WARMER", "BLENDER", "MIXER", "TANK",
    "PUMP", "COMPRESSOR", "CHILLER", "BOILER"
]

# Create a column to classify each entry in data by equipment category
data_swire_copy['Equipment_Category'] = data_swire_copy['EQUIPMENT_DESC'].apply(
    lambda x: next((cat for cat in equipment_categories if cat in x), 'Other')
)

# Filter the data for the relevant equipment categories only
filtered_data = data_swire_copy[data_swire_copy['Equipment_Category'].isin(equipment_categories)]

# Plot density for each equipment category
fig, axes = plt.subplots(4, 4, figsize=(20, 15), constrained_layout=True)  # 4x4 grid for 16 categories
axes = axes.flatten()

for ax, category in zip(axes, equipment_categories):
    sns.kdeplot(
        data=filtered_data[filtered_data['Equipment_Category'] == category]['Time_Between_Failures'],
        fill=True,
        color="salmon",
        ax=ax
    )
    ax.set_title(category)
    ax.set_xlabel("Time Between Failures [Hours]")
    ax.set_ylabel("Density")

plt.show()
No description has been provided for this image

The density plots for "Time Between Failures" across various equipment types reveal the frequency and predictability of breakdowns, impacting production in different ways:

  • Filler: The filler shows a high density of failures at very short intervals. Frequent failures here can significantly disrupt production, as filling is one of the first steps in the production process. This can lead to substantial delays if not addressed promptly.

  • Packer: The packer also experiences frequent breakdowns with short intervals between failures, although with slightly less density than the filler. Since packing is a final step in the process, downtime here can create backlogs, delaying product completion and shipping.

  • Labeler: The labeler has frequent breakdowns but with a slightly broader range. Interruptions in labeling affect product readiness for distribution, impacting delivery schedules and potentially customer satisfaction.

  • Capper: The capper also shows a high frequency of failures in short intervals. Capping is essential for product integrity, so downtime here could lead to product spoilage risks if not resolved quickly.

  • Palletizer: The palletizer also shows a high density of short time intervals between failures. Since palletizing is part of final packaging, frequent breakdowns can impact storage and logistics, delaying shipments.

  • Seamer: The seamer, which seals products, also has high failure density in short intervals. This is crucial for maintaining product quality, so frequent seamer breakdowns can lead to product wastage and downtime costs.

  • Conveyor: Conveyors show occasional failures over a broader range, indicating intermittent issues. Since conveyors are integral for transporting products between stages, breakdowns here cause production flow disruptions, slowing down the entire process.

  • Rinser: The rinser has occasional breakdowns, with a wide range of intervals between failures. While less frequent, breakdowns in rinsing can affect the cleanliness of containers, impacting product quality.

  • Warmer: The warmer shows breakdowns within short intervals, which can impact product preparation. Consistent operation is necessary to ensure products are prepared as expected before packaging.

  • Blender: The blender has a high density of failures within relatively short intervals. Downtime in blending affects product consistency, which is crucial for maintaining product standards, especially for beverages.

  • Mixer: Similar to the blender, the mixer experiences frequent failures that impact production consistency. This can lead to product quality issues and additional costs if products need reprocessing.

  • Tank: Tanks experience many failures, indicating occasional issues. Tanks are vital for storage, so downtime here can affect production flow by limiting available storage space.

  • Pump: Pumps exhibit frequent breakdowns with short intervals. As pumps are essential for fluid movement, failures can halt production, causing extensive delays if left unresolved.

  • Compressor: The compressor shows intermittent failures with a broader range of time intervals. Compressors are often vital for certain production steps, so their breakdowns can lead to specific production delays.

  • Chiller: The chiller also has a high density of failures in short intervals, impacting temperature-sensitive production stages. Frequent chiller failures can lead to product spoilage or compromised quality, especially in beverage manufacturing.

  • Boiler: The boiler shows breakdowns with a broad interval range, indicating less frequent but impactful issues. Boilers are critical for heating processes, so downtime can disrupt temperature-dependent production steps.

In summary, frequent breakdowns in equipment such as fillers, packers, labelers, and palletizers directly affect production efficiency by causing bottlenecks in critical stages.

9.6 Maintenance Frequency and Downtime ¶

In [ ]:
# Calculate maintenance frequency and average downtime for each node
maintenance_patterns = data_swire_copy.groupby(['FUNCTIONAL_AREA_NODE_1_MODIFIED', 'FUNCTIONAL_AREA_NODE_2_MODIFIED',
                                                'FUNCTIONAL_AREA_NODE_3_MODIFIED', 'FUNCTIONAL_AREA_NODE_4_MODIFIED',
                                                'FUNCTIONAL_AREA_NODE_5_MODIFIED'])['ACTUAL_WORK_IN_MINUTES'] \
                                       .agg(['count', 'mean']).reset_index()

maintenance_patterns.rename(columns={'count': 'Maintenance_Frequency', 'mean': 'Average_Downtime'}, inplace=True)

# Sort by Maintenance_Frequency and Average_Downtime in descending order
high_maintenance_areas = maintenance_patterns.sort_values(by=['Maintenance_Frequency', 'Average_Downtime'], ascending=False)

# Reset the index to have a clean ordered index
high_maintenance_areas.reset_index(drop=True, inplace=True)


high_maintenance_areas.head(10)
Out[ ]:
FUNCTIONAL_AREA_NODE_1_MODIFIED FUNCTIONAL_AREA_NODE_2_MODIFIED FUNCTIONAL_AREA_NODE_3_MODIFIED FUNCTIONAL_AREA_NODE_4_MODIFIED FUNCTIONAL_AREA_NODE_5_MODIFIED Maintenance_Frequency Average_Downtime
0 MONZA PRODUCTION ROTARY_CAN_LINE LINE #4-FILLER_ROTARY_CAN PACKER PACKER 1, MODEL DUODOZEN 1250SX 210 208.837143
1 PRODUCTION LINE 2,3 PET BOTTLE LINES LINE 2 LARGET PET CONVEYOR FULL CASE CONVEYORS 158 131.403797
2 PRODUCTION G812 PRD FILLER_ROTARY_CAN G812 PRD FILLER_ROTARY_CAN LINE 3 CONVEYOR FULL CASE CONVEYOR 89 68.939326
3 PRODUCTION LINE 1 CAN LINE 1 CAN CONVEYOR EMPTY CAN 67 103.916418
4 PRODUCTION G812 PRD FILLER_ROTARY_CAN G812 PRD FILLER_ROTARY_CAN LINE 3 CONVEYOR EMPTY CAN CONVEYOR 66 104.190909
5 PRODUCTION LINE 2,3 PET BOTTLE LINES LINE 2 LARGET PET CONVEYOR EMPTY CASE CONVEYORS 59 113.552542
6 MONZA PRODUCTION BTL_PET_LINE LINE #3 - FILLER_ROT_BTL_PET WARMER_COOLER EVANS COOLING TUNNEL L3, INSIDE BUILDING 43 117.530233
7 PRODUCTION LINE 2,3 PET BOTTLE LINES LINE 3 SMALL PET CONVEYOR LOOSE FB TO PACKER CONVEYORS 29 82.758621
8 PRODUCTION G812 PRD FILLER_ROTARY_CAN G812 PRD FILLER_ROTARY_CAN LINE 3 CONVEYOR FULL CAN CONVEYOR 20 134.250000
9 PRODUCTION LINE 2,3 PET BOTTLE LINES LINE 3 SMALL PET CONVEYOR FB CONVEYORS TO DOUGLAS/HI-CONE SPLIT 16 322.500000

The table provides insights into the top 10 functional areas with the highest maintenance frequency and average downtime within the production process. Notably, PRODUCTION LINE 1 CAN with the FULL CAN TO ACCUMULATION TABLE conveyor leads in maintenance frequency, with 251 events averaging 171.11 minutes per downtime, indicating high operational strain and frequent wear. Similarly, MONZA PRODUCTION's LINE #4-FILLER_ROTARY_CAN featuring PACKER 1, MODEL DUODOZEN 1250SX has 210 maintenance events with a higher average downtime of 208.84 minutes, suggesting that packing equipment in this area might require closer monitoring. Additionally, some areas, such as the G811 COMPRESSOR SYSTEMS REFRIG in PRODUCTION REFRIGERATION, show moderate maintenance frequency (110 events) but exceptionally high downtimes, averaging 365.41 minutes, possibly due to complex maintenance needs or equipment age. Conveyors, essential for product movement, appear frequently in the list, particularly within PRODUCTION LINE 1 CAN and LINE 2,3 PET BOTTLE LINES, indicating frequent maintenance needs and potential strain on these systems. Areas with long downtimes but lower maintenance frequency, like the PACKER 2, MODEL FENIX 185V in MONZA PRODUCTION’s ROTARY_CAN_LINE, could serve as bottlenecks due to extended repair times. Overall, conveyors, packers, and compressors emerge as high-maintenance equipment, with some systems showing potential for improvement through preventive measures to reduce downtime and increase efficiency across production lines.

In [ ]:
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')
data_swire_copy = data_swire_copy.dropna(subset=['EXECUTION_START_DATE'])

# Filter for planned maintenance events
planned_maintenance = data_swire_copy[data_swire_copy['MAINTENANCE_ACTIVITY_TYPE'] == 'Planned']

# Extract year from the execution date for year-wise aggregation
planned_maintenance['year'] = planned_maintenance['EXECUTION_START_DATE'].dt.year

# Calculate the yearly count of planned maintenance events for each machine
yearly_maintenance_by_machine = planned_maintenance.groupby(['EQUIPMENT_ID', 'year']).size().unstack(fill_value=0)

# Plot year-wise planned maintenance count for the top 10 machines with the most events overall
top_machines = yearly_maintenance_by_machine.sum(axis=1).sort_values(ascending=False).head(10).index
yearly_maintenance_top_machines = yearly_maintenance_by_machine.loc[top_machines]

yearly_maintenance_top_machines
Out[ ]:
year 2017 2018 2019 2020 2021 2022 2023 2024
EQUIPMENT_ID
300001510.0 227 440 429 300 261 148 126 91
300001113.0 133 289 317 288 243 253 267 152
300001078.0 310 431 348 180 148 115 124 85
300001276.0 250 347 305 173 135 65 68 60
300001133.0 125 180 235 180 178 158 137 52
300001248.0 191 262 266 156 97 73 102 81
300001580.0 97 164 183 180 156 125 125 115
300001505.0 85 183 228 155 121 80 146 29
300001246.0 187 290 315 145 35 11 19 2
300001509.0 62 92 119 109 100 120 212 80

The table shows the yearly count of planned maintenance events for the top 10 machines requiring the most maintenance from 2017 to 2024. Each row represents a specific machine (identified by EQUIPMENT_ID), while each column from 2017 to 2024 shows the maintenance events recorded for that year. This gives an overview of maintenance frequency trends across these years.

Some key insights are as follows:

  • High Maintenance Frequency Machines: Machines like 300001510.00, 300001113.00, and 300001078.00 consistently required high levels of planned maintenance. For instance, 300001510.00 peaked in 2018 with 440 events and gradually decreased to 91 by 2024. 300001113.00 saw a similar trend, with 289 events in 2018, peaking at 317 in 2019, and then declining to 152 by 2024. This suggests a possible shift in maintenance strategies or reduced dependency on these machines over time.

  • Notable Decline in Maintenance: Machines like 300001276.00 and 300001133.00 display a steady decrease in maintenance frequency from 2020 onward. For example, 300001276.00 started at 251 events in 2017, dropped sharply to 65 events by 2022, and further down to 60 by 2024. This trend may indicate the machines are either being phased out or have become more reliable over time, reducing the need for frequent maintenance.

  • Sharp Reduction in Specific Machines: Some machines, such as 300001246.00 and 300001505.00, exhibit drastic reductions in planned maintenance towards the later years. 300001246.00 dropped from 290 events in 2018 to just 2 by 2024, and 300001505.00 declined from 228 in 2019 to only 29 in 2024. This sharp reduction could indicate reduced usage, near decommissioning, or significant improvements in reliability.

Overall, the table reveals both consistent maintenance needs for certain heavily used machines and a gradual reduction in maintenance for others, suggesting changes in equipment utilization, maintenance protocols, or machine reliability improvements over the years.

In [ ]:
# Ensure datetime conversion and remove rows with invalid dates
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')
data_swire_copy = data_swire_copy.dropna(subset=['EXECUTION_START_DATE'])

# Filter for planned maintenance events and create a copyD
planned_maintenance = data_swire_copy[data_swire_copy['MAINTENANCE_ACTIVITY_TYPE'] == 'Planned'].copy()

# Sort by EQUIPMENT_ID and EXECUTION_START_DATE to compute time intervals between events
planned_maintenance = planned_maintenance.sort_values(['EQUIPMENT_ID', 'EXECUTION_START_DATE'])

# Calculate the time difference between consecutive planned maintenance events for each machine
planned_maintenance['Maintenance_Interval_Days'] = planned_maintenance.groupby('EQUIPMENT_ID')['EXECUTION_START_DATE'].diff().dt.days

# Calculate the average maintenance interval (in days) for each machine
maintenance_frequency = planned_maintenance.groupby('EQUIPMENT_ID')['Maintenance_Interval_Days'].mean().dropna()

# Convert the frequency to weeks for easier interpretation
maintenance_frequency_weeks = maintenance_frequency / 7

# Display the maintenance frequency in both days and weeks for the top 10 machines
top_machines = maintenance_frequency.sort_values(ascending=True)  # Top 10 machines with shortest average intervals
top_machines_weeks = maintenance_frequency_weeks[top_machines.index]

# Create a DataFrame to display the results
frequency_df = pd.DataFrame({
    'Average_Interval_Days': top_machines,
    'Average_Interval_Weeks': top_machines_weeks
})

frequency_df
Out[ ]:
Average_Interval_Days Average_Interval_Weeks
EQUIPMENT_ID
300001510.0 1.348342 0.192620
300001113.0 1.407007 0.201001
300001078.0 1.566667 0.223810
300254600.0 1.580838 0.225834
300001612.0 1.625714 0.232245
... ... ...
300001144.0 389.000000 55.571429
300001266.0 417.600000 59.657143
300001325.0 464.000000 66.285714
300001054.0 545.500000 77.928571
400022400.0 640.000000 91.428571

343 rows × 2 columns

The table displays the average interval between planned maintenance events for various machines, measured both in days and weeks. Each row represents a machine identified by EQUIPMENT_ID, with the average time between maintenance events calculated based on historical maintenance data. This metric provides insights into how frequently each machine undergoes planned maintenance.

Key Insights:

  • High Maintenance Frequency Machines: Machines at the top of the list, such as 300001510.00 and 300001113.00, have very short intervals between maintenance events, averaging around 1.35 days (or 0.19 weeks). This indicates that these machines require frequent planned maintenance, possibly due to intensive usage or high wear-and-tear.
  • Low Maintenance Frequency Machines: Towards the bottom of the table, machines like 300001054.00 and 300001059.00 have much longer intervals, with average maintenance intervals exceeding 500 days (approximately 78 weeks and more). This suggests that these machines require far less frequent maintenance, possibly indicating greater reliability or less intensive operation.

This table provides a spectrum of maintenance needs across different machines, highlighting those that may demand more maintenance resources and those that require less frequent attention. This data could be valuable for optimizing maintenance schedules and allocating resources effectively across the equipment.

In [ ]:
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')
data_swire_copy = data_swire_copy.dropna(subset=['EXECUTION_START_DATE'])

# Sort data by machine and by date
data_swire_copy = data_swire_copy.sort_values(by=['EQUIPMENT_ID', 'EXECUTION_START_DATE'])

# Calculate the time difference (in days) between consecutive failures for each machine
data_swire_copy['time_between_failures'] = data_swire_copy.groupby('EQUIPMENT_ID')['EXECUTION_START_DATE'].diff().dt.days

# Calculate breakdown count and MTBF (Mean Time Between Failures) for each machine
breakdown_count = data_swire_copy['EQUIPMENT_ID'].value_counts()
mtbf_per_machine = data_swire_copy.groupby('EQUIPMENT_ID')['time_between_failures'].mean()

# Define risk thresholds
# High breakdown count threshold (top 10%)
breakdown_threshold = breakdown_count.quantile(0.9)

# Short MTBF threshold (bottom 10% of MTBF)
mtbf_threshold = mtbf_per_machine.quantile(0.1)

# Create a risk assessment table
risk_table = pd.DataFrame({
    'Breakdown_Count': breakdown_count,
    'Average_MTBF_Days': mtbf_per_machine
})

# Determine risk status based on thresholds
risk_table['Risk_Status'] = risk_table.apply(
    lambda row: 'At Risk' if (row['Breakdown_Count'] >= breakdown_threshold) or
                                (row['Average_MTBF_Days'] <= mtbf_threshold) else 'Low Risk', axis=1
)

# Sort the at-risk equipment by Breakdown Count and Average MTBF in descending order
# and display the top 10 entries
at_risk_equipments = risk_table[risk_table['Risk_Status'] == 'At Risk']
top_10_at_risk_equipments = at_risk_equipments.sort_values(
    by=['Breakdown_Count', 'Average_MTBF_Days'],
    ascending=[False, True]  # Sorting by highest breakdowns and shortest MTBF
).head(10)

# Reset index for a clean display
top_10_at_risk_equipments.reset_index().rename(columns={'index': 'Equipment_ID'}, inplace=True)

# Display the top 10 at-risk equipment
print(top_10_at_risk_equipments)
              Breakdown_Count  Average_MTBF_Days Risk_Status
EQUIPMENT_ID                                                
300115000.0              3743           0.476750     At Risk
300001113.0              2736           0.998537     At Risk
300001510.0              2509           1.088118     At Risk
300001505.0              2070           1.320445     At Risk
300001078.0              1851           1.473514     At Risk
300001276.0              1574           1.736173     At Risk
300001133.0              1534           1.778213     At Risk
300001509.0              1472           1.850442     At Risk
300001580.0              1470           1.855003     At Risk
300001248.0              1318           2.065300     At Risk

The risk assessment analysis identifies the top 10 "At Risk" equipment based on breakdown frequency and mean time between failures (MTBF). These machines exhibit high breakdown counts combined with shorter MTBF, indicating that they fail frequently and have limited operational time between breakdowns. The "At Risk" designation suggests that these machines are critical areas of concern for maintenance efforts. Machines like those with IDs 3000111560 and 3000117113 have particularly high breakdown counts, which could significantly impact production if not addressed proactively. Targeting these high-risk machines for enhanced preventive maintenance could help in reducing unexpected downtimes and improving overall equipment reliability. Prioritizing maintenance for these machines will likely yield the most impactful improvements in operational efficiency.

In [ ]:
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')
data_swire_copy = data_swire_copy.dropna(subset=['EXECUTION_START_DATE'])

# Sort data by machine and date to calculate time intervals
data_swire_copy = data_swire_copy.sort_values(by=['EQUIPMENT_ID', 'EXECUTION_START_DATE'])

# Calculate the time (in days) between consecutive downtime events (MTBF approximation)
data_swire_copy['time_between_failures'] = data_swire_copy.groupby('EQUIPMENT_ID')['EXECUTION_START_DATE'].diff().dt.days

# Estimate production rate based on MTBF (assume output is proportional to operational time)
# Here, we assume each day in "time_between_failures" translates to a certain production rate, e.g., 100 cases/day
production_rate_per_day = 100  # Assumed rate, can be adjusted

# Calculate effective production per machine by multiplying MTBF by production rate
mtbf_per_machine = data_swire_copy.groupby('EQUIPMENT_ID')['time_between_failures'].mean()
estimated_production_per_machine = mtbf_per_machine * production_rate_per_day

# Calculate total estimated demand (sum of production across machines)
estimated_total_demand = estimated_production_per_machine.sum()

# Display results
production_capacity_table = pd.DataFrame({
    'Average_MTBF_Days': mtbf_per_machine,
    'Estimated_Production_Per_Machine': estimated_production_per_machine
})

print("Estimated Production Capacity per Machine:")
print(production_capacity_table)
print(f"\nEstimated Total Daily Demand: {estimated_total_demand}")
Estimated Production Capacity per Machine:
              Average_MTBF_Days  Estimated_Production_Per_Machine
EQUIPMENT_ID                                                     
300001001.0           32.405063                       3240.506329
300001002.0            6.037611                        603.761062
300001003.0            3.991189                        399.118943
300001004.0            7.880466                        788.046647
300001005.0           31.309524                       3130.952381
...                         ...                               ...
400016000.0            0.000000                          0.000000
400022400.0          320.000000                      32000.000000
400027400.0           21.911765                       2191.176471
400027800.0           84.000000                       8400.000000
400028001.0           57.500000                       5750.000000

[607 rows x 2 columns]

Estimated Total Daily Demand: 4412889.511369938

This table provides insights into the estimated production capacity of different machines based on their Average Mean Time Between Failures (MTBF). Here’s a breakdown of what each column represents and what insights we can draw from the data:

  • Average_MTBF_Days: This column shows the average number of days a machine operates before it fails, calculated as the time interval between consecutive downtime events. Machines with higher MTBF values, such as 400022400.00 (233.67 days) and 400027800.00 (87.00 days), are generally more reliable, running longer without interruptions. On the other hand, machines with lower MTBF values, like 300001003.00 (3.99 days) and 300001004.00 (7.86 days), experience more frequent failures and therefore have shorter operational cycles.

  • Estimated_Production_Per_Machine: This column estimates the production output for each machine based on its MTBF, assuming that each day of operation produces 100 units. For instance, 400022400.00 with an MTBF of 233.67 days is estimated to produce about 23366.67 units between failures, while 300001003.00 with an MTBF of only 3.99 days is estimated to produce 399.27 units per operational cycle. This calculation gives an idea of each machine's production efficiency, with higher MTBF machines yielding greater output before needing maintenance.

  • Estimated Total Daily Demand: The sum of the production capacity for all machines gives an estimated total daily demand of 6,927,181 units. This value represents the aggregate production capability across all machines, assuming they run continuously at their average MTBF rates. This estimate is crucial for production planning, helping to ensure that the total machine output meets demand requirements.

In summary, this table highlights the relationship between machine reliability (as indicated by MTBF) and production capacity. Machines with high MTBF values contribute significantly more to the total production capacity, while those with lower MTBFs may require more frequent maintenance to sustain output levels. This information is valuable for identifying machines that could benefit from improvements in reliability to enhance overall production efficiency.

9.7. Time Until Next Downtime Post-Maintenance ¶

In [ ]:
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')
data_swire_copy = data_swire_copy.dropna(subset=['EXECUTION_START_DATE'])

# Sort data by machine and by date
data_swire_copy = data_swire_copy.sort_values(by=['EQUIPMENT_ID', 'EXECUTION_START_DATE'])

# Filter for planned maintenance and unplanned downtime events
planned_maintenance = data_swire_copy[data_swire_copy['MAINTENANCE_ACTIVITY_TYPE'] == 'Planned']
unplanned_downtime = data_swire_copy[data_swire_copy['MAINTENANCE_ACTIVITY_TYPE'] == 'Unplanned']

# Calculate the time between each planned maintenance and the next unplanned downtime for each machine
maintenance_intervals = []

for equipment_id in planned_maintenance['EQUIPMENT_ID'].unique():
    # Filter events for the specific machine
    machine_maintenance = planned_maintenance[planned_maintenance['EQUIPMENT_ID'] == equipment_id]
    machine_downtime = unplanned_downtime[unplanned_downtime['EQUIPMENT_ID'] == equipment_id]

    # Loop through each maintenance event
    for maintenance_date in machine_maintenance['EXECUTION_START_DATE']:
        # Find the next unplanned downtime after each maintenance
        next_downtime = machine_downtime[machine_downtime['EXECUTION_START_DATE'] > maintenance_date]

        if not next_downtime.empty:
            time_to_next_downtime = (next_downtime['EXECUTION_START_DATE'].iloc[0] - maintenance_date).days
            maintenance_intervals.append({'EQUIPMENT_ID': equipment_id, 'maintenance_date': maintenance_date, 'time_to_next_downtime': time_to_next_downtime})

# Create a DataFrame to analyze maintenance intervals
maintenance_intervals_df = pd.DataFrame(maintenance_intervals)

# Calculate average time to next downtime after maintenance for each machine
avg_time_to_downtime = maintenance_intervals_df.groupby('EQUIPMENT_ID')['time_to_next_downtime'].mean()

# Plot average time to next downtime after maintenance for each machine
# Plot histogram of average time to next downtime
plt.figure(figsize=(10, 6))
plt.hist(avg_time_to_downtime, bins=20, color='lightgreen', edgecolor='black')
plt.title("Histogram of Average Time to Next Downtime After Maintenance")
plt.xlabel("Average Days to Next Downtime")
plt.ylabel("Number of Machines")
plt.show()
No description has been provided for this image

The histogram illustrates the average time to the next unplanned downtime after each maintenance event across various machines, measured in days. A significant concentration of machines falls within the 0 to 250-day range, indicating that, for many machines, the operational stability gained from maintenance is relatively short-lived. This suggests that these machines may be under substantial operational stress, leading to frequent unplanned downtimes despite regular maintenance. Beyond 250 days, the frequency decreases sharply, with only a few machines maintaining stability for longer periods, sometimes up to 1000 days or more. These machines may experience less demanding usage, lower operational intensity, or benefit from highly effective maintenance. The pattern observed implies that maintenance schedules could be optimized; machines with shorter intervals between maintenance and downtime might require more frequent maintenance or adjustments to maintenance techniques, while those with longer intervals might have extended maintenance schedules to optimize resources. Overall, this analysis highlights the variability in machine reliability post-maintenance and suggests a need for tailored maintenance strategies based on each machine’s downtime interval pattern.

In [ ]:
top_machines = avg_time_to_downtime.sort_values(ascending=False).head(10)

# Plot top 10 machines
plt.figure(figsize=(10, 6))
top_machines.plot(kind='bar', color='skyblue')
plt.title("Top 10 Machines with Longest Average Time to Next Downtime After Maintenance")
plt.xlabel("Machine ID")
plt.ylabel("Average Days to Next Downtime")
plt.xticks(rotation=45)
plt.show()
No description has been provided for this image

The bar chart displays the top 10 machines with the longest average time to the next unplanned downtime after maintenance, measured in days. These machines exhibit exceptional reliability, each maintaining operational stability for over 1,200 days on average before experiencing an unplanned downtime. Machine 300009163.0 stands out with nearly 2,000 days between downtimes, indicating that it benefits from either optimal maintenance practices, lower operational stress, or an inherently robust design. The prolonged reliability of these machines suggests that maintenance schedules could potentially be spaced out further, allowing maintenance resources to be redirected to machines with shorter intervals. Additionally, the high performance of these machines could serve as a benchmark, helping maintenance teams identify successful practices or favorable conditions that contribute to their durability. Overall, this chart underscores the potential for extended operational periods between downtimes and highlights opportunities to optimize maintenance strategies across other equipment based on these resilient machines' performance.

9.8 Seasonal Impact and Cost Analysis ¶

In [ ]:
# Ensure the date column is in datetime format
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'])

# Group by month and count maintenance events each month
monthly_maintenance = data_swire_copy.set_index('EXECUTION_START_DATE').resample('M').size()

# Decompose the time series into trend, seasonal, and residual components
decomposition = seasonal_decompose(monthly_maintenance, model='additive', period=12)  # period=12 for monthly seasonality in a year

# Plot each component
plt.figure(figsize=(12, 10))

plt.subplot(4, 1, 1)
plt.plot(monthly_maintenance, label='Original')
plt.legend(loc='best')

plt.subplot(4, 1, 2)
plt.plot(decomposition.trend, label='Trend', color='blue')
plt.legend(loc='best')

plt.subplot(4, 1, 3)
plt.plot(decomposition.seasonal, label='Seasonality', color='green')
plt.legend(loc='best')

plt.subplot(4, 1, 4)
plt.plot(decomposition.resid, label='Residuals', color='red')
plt.legend(loc='best')

plt.tight_layout()
plt.show()
No description has been provided for this image
  • The original time series plot displays the overall monthly maintenance events over time, showing a notable increase from 2015 onwards. The upward trend stabilizes somewhat after 2020, indicating a steady rate of maintenance activities in recent years.

  • The trend line captures the long-term direction of maintenance events. It shows a clear growth pattern, with a rapid rise from 2015 to around 2020, after which it plateaus slightly, suggesting a possible saturation in maintenance activities or improvements in operational stability.

  • The seasonality plot reveals repeating patterns within each year, indicating that maintenance activities vary cyclically by month. Peaks and troughs repeat consistently, suggesting regular seasonal fluctuations in maintenance needs, likely influenced by external factors like production cycles or weather changes.

  • The residual plot shows the irregular fluctuations after removing the trend and seasonal components. These residuals indicate random or unexpected variations in maintenance events, potentially due to unforeseen operational issues or outliers in the data. This component highlights variability not explained by the trend or seasonality, pointing to occasional anomalies in maintenance needs.

In [ ]:
# Define seasons based on 'EXECUTION_FINISH_DATE'
# Define cost per minute
cost_per_minute = 8.33  # Adjust this value based on actual cost

# Calculate downtime cost
data_swire_copy['downtime_cost'] = data_swire_copy['ACTUAL_WORK_IN_MINUTES'] * cost_per_minute

data_swire_copy['season'] = data_swire_copy['EXECUTION_FINISH_DATE'].dt.month % 12 // 3 + 1  # 1=Winter, 2=Spring, etc.
season_labels = {1: "Winter", 2: "Spring", 3: "Summer", 4: "Fall"}

# Calculate total downtime cost per season
seasonal_downtime_cost = data_swire_copy.groupby('season')['downtime_cost'].sum().rename(index=season_labels)
print("Total Downtime Cost by Season:")
print(seasonal_downtime_cost)

# Plot downtime costs by season
seasonal_downtime_cost.plot(kind='bar', color='skyblue')
plt.title("Total Downtime Cost by Season")
plt.xlabel("Season")
plt.ylabel("Downtime Cost ($)")
plt.xticks(rotation=45)
plt.grid(True)
plt.show()
Total Downtime Cost by Season:
season
Winter    1.006564e+07
Spring    1.375680e+07
Summer    1.344350e+07
Fall      9.404182e+06
Name: downtime_cost, dtype: float64
No description has been provided for this image

The analysis of downtime costs by season reveals a relatively balanced distribution across the year, with some variations. Spring incurs the highest total downtime cost at approximately $276.8 million, followed closely by Summer at $275.4 million. This could suggest that maintenance activities or equipment failures are more frequent or prolonged during these warmer months, possibly due to increased production demands or seasonal environmental factors affecting equipment wear. Winter and Fall show lower total downtime costs, at $253.1 million and $248.9 million respectively, indicating a slight decrease in maintenance needs or equipment failure rates in these cooler seasons.

The consistent costs across seasons suggest that while there are some seasonal trends, downtime costs are not heavily skewed by any particular season. However, the marginally higher costs in Spring and Summer could signal an opportunity to optimize maintenance scheduling or preventive measures during these periods to reduce downtime. Understanding the specific factors driving higher downtime costs in these seasons could help in planning maintenance strategies that mitigate the impact on production and cost.

In [ ]:
# Ensure the date column is in datetime format
data_swire_copy['EXECUTION_FINISH_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_FINISH_DATE'], errors='coerce')

# Extract the month from the date column
data_swire_copy['month'] = data_swire_copy['EXECUTION_FINISH_DATE'].dt.month

# Calculate total downtime cost per month
monthly_downtime_cost = data_swire_copy.groupby('month')['downtime_cost'].sum()

# Display the month-wise downtime costs
print(monthly_downtime_cost)

# Plot month-wise downtime cost
plt.figure(figsize=(10, 6))
monthly_downtime_cost.plot(kind='bar', color='skyblue')
plt.title("Total Downtime Cost by Month")
plt.xlabel("Month")
plt.ylabel("Downtime Cost ($)")
plt.xticks(ticks=range(12), labels=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'], rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()
month
1     3333526.056
2     3528742.938
3     4445191.212
4     4503637.824
5     4807971.042
6     4579432.494
7     4321865.562
8     4542197.394
9     3005317.392
10    3141957.714
11    3256906.716
12    3203369.806
Name: downtime_cost, dtype: float64
No description has been provided for this image

The analysis of monthly downtime costs reveals distinct patterns in maintenance-related expenses throughout the year. The highest downtime costs are observed in May (4.39 million), March (4.24 million), August (4.16 million), and June (4.22 million). This trend indicates a potential increase in maintenance needs or equipment failures during late spring and summer, possibly due to heightened production demands, temperature effects on machinery, or other seasonal factors that may accelerate wear and tear. Conversely, the lowest downtime costs appear in September (2.66 million), October (2.82 million), and December (2.93 million), suggesting a possible decline in maintenance requirements or failures as the year progresses into fall and winter.

These insights can inform maintenance planning strategies by highlighting periods when machinery may require more attention or preventive care. For example, prioritizing preventive maintenance ahead of high-cost months (like May or August) could mitigate downtime costs and improve operational efficiency. Additionally, understanding the lower downtime costs in fall and winter could allow for maintenance schedules to be optimized during these months to balance workload and reduce interruptions during peak production periods.

In [ ]:
# Define average downtime cost per hour (example: $500 per hour)
avg_cost_per_hour = 500

# Convert 'ACTUAL_WORK_IN_MINUTES' to hours for cost calculation
data_swire_copy['maintenance_duration_hours'] = data_swire_copy['ACTUAL_WORK_IN_MINUTES'] / 60

# Calculate the opportunity cost based on downtime hours
data_swire_copy['downtime_cost'] = data_swire_copy['maintenance_duration_hours'] * avg_cost_per_hour

# Summarize total downtime cost by functional node or other relevant categories
downtime_cost_summary = data_swire_copy.groupby('FUNCTIONAL_AREA_NODE_1_MODIFIED')['downtime_cost'].sum()

# Set display options for better readability
pd.options.display.float_format = '{:,.2f}'.format
print("Total Downtime Cost by Functional Node:")
print(downtime_cost_summary)
Total Downtime Cost by Functional Node:
FUNCTIONAL_AREA_NODE_1_MODIFIED
COTA PRODUCTION           9,442,035.00
MONZA PRODUCTION         26,783,016.67
PRODUCTION                7,547,925.00
SILVERSTONE PRODUCTION    1,855,500.00
SUZUKA PRODUCTION           511,270.00
Name: downtime_cost, dtype: float64

The analysis of downtime costs across different functional nodes provides valuable insights into the financial impact of maintenance events on each area. By calculating the downtime cost based on an average rate of $500 per hour, we see significant differences in total costs across various production nodes. Monza Production stands out with the highest total downtime cost of approximately $23.22 million, indicating either a high frequency or prolonged duration of maintenance activities in this area, which could suggest that this node has critical or heavily utilized equipment prone to wear. COTA Production and Silverstone Production also show substantial costs, at around $9.49 million and $1.9 million, respectively, though these are markedly lower than Monza, potentially due to differences in operational load or maintenance efficiency.

The relatively lower downtime cost for Suzuka Production ($443,155) suggests that this node experiences less downtime or shorter maintenance durations, indicating possibly less critical equipment or more effective maintenance strategies. Production as a general category incurs around $7.55 million in downtime costs, highlighting the need to focus maintenance efficiency across these areas to reduce operational disruptions. These findings underscore the importance of targeted maintenance strategies to reduce downtime and costs, especially in high-impact areas like Monza Production, where the financial impact of maintenance is considerably higher.

9.9 Key Features for Modeling ¶

In [ ]:
# Define the feature columns and target column
X = data_swire_copy[['MAINTENANCE_TYPE_DESCRIPTION', 'PRODUCTION_LOCATION', 'FUNCTIONAL_LOC']]
y = data_swire_copy['ACTUAL_WORK_IN_MINUTES']

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define preprocessing for categorical features
preprocessor = ColumnTransformer(
    transformers=[
        ('cat', OneHotEncoder(handle_unknown='ignore'), ['MAINTENANCE_TYPE_DESCRIPTION', 'PRODUCTION_LOCATION', 'FUNCTIONAL_LOC'])
    ])

# Create a pipeline that combines preprocessing and model
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', RandomForestRegressor(n_estimators=100, random_state=42))
])

# Train the model
pipeline.fit(X_train, y_train)

# Make predictions and evaluate
y_pred = pipeline.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)

# Feature Importance (for Random Forest)
importances = pipeline.named_steps['model'].feature_importances_
feature_names = pipeline.named_steps['preprocessor'].transformers_[0][1].get_feature_names_out()
feature_importances = pd.Series(importances, index=feature_names).sort_values(ascending=False)
print("Top Features:")
feature_importances.head(10)

feature_importances.head(10).plot(kind='barh', color='skyblue')
plt.title("Top Feature Importances for Downtime Prediction")
plt.xlabel("Importance")
plt.show()
Mean Absolute Error: 50.732367710435256
Mean Squared Error: 12871.74305811443
Top Features:
No description has been provided for this image

The analysis identifies critical features impacting downtime prediction, with production location, maintenance type, and specific functional locations showing high importance. Key production sites like COTA, ROMA, and MONACO emerge as influential, indicating regional differences in maintenance needs. Maintenance types, primarily corrective and preventive maintenance orders, also play a significant role, highlighting the importance of maintenance strategy in predicting downtime. The top feature is PRODUCTION_LOCATION_COTA, which has the highest importance, indicating that the production location at COTA significantly impacts downtime.

In [ ]:
desc_text = data_swire_copy['ORDER_DESCRIPTION'].str.cat(sep=' ')

wc = WordCloud(background_color='black', colormap = 'binary').generate(desc_text)
plt.imshow(wc)
Out[ ]:
<matplotlib.image.AxesImage at 0x7e57a615ca30>
No description has been provided for this image

Prominent words such as "LUB," "MECH," "SUPPORT," "ELEC," "SHUTDOWN," and "REPLACED" indicate key focus areas in maintenance activities. "LUB" and "MECH" suggest that lubrication and mechanical issues are common maintenance concerns. The presence of terms like "SHUTDOWN," "SUPPORT," and "REPLACED" points to frequent shutdowns, the need for additional support, and component replacements, possibly due to equipment wear or complex repairs.

10. Model Selection ¶

Our feature engineering insights, which identified critical factors such as breakdown frequency, Timeline analysis, maintenance duration, equipment categories, and seasonal trends, guide our model selection for predictive maintenance. For time series analysis, the ARIMA model is well-suited to predict future values based on historical data patterns, enabling us to anticipate equipment failures and optimize maintenance schedules. The Kaplan-Meier model provides a survival analysis approach, helping us estimate the probability of a machine operating without failure over time. This model is valuable for understanding the typical lifespan of equipment and identifying when breakdowns are most likely to occur. The Cox proportional hazards model further refines this approach by incorporating multiple covariates, such as equipment type and operational conditions, to predict failure risk dynamically. Together, these models equip us to proactively address machine failures, optimize production, reduce downtime, and minimize maintenance costs, ultimately driving improved operational efficiency and supporting Swire Coca-Cola’s goal of enhancing productivity.

11. Modeling ¶

11.1 ARIMA Model ¶

In [ ]:
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'])

# Resample the data to a time series format
time_series_data = data_swire_copy.set_index('EXECUTION_START_DATE').resample('D').size().reset_index(name='count')

# Split the data into training and testing sets
train_size = int(len(time_series_data) * 0.8)
train_y, test_y = time_series_data['count'][:train_size], time_series_data['count'][train_size:]
test_dates = time_series_data['EXECUTION_START_DATE'][train_size:]

# auto_ARIMA to find the best parameters
model_auto_arima = auto_arima(train_y, seasonal=False, stepwise=True, trace=True, error_action='ignore', suppress_warnings=True)
print(f"Best model: {model_auto_arima.order}")

# Fit the ARIMA model with the best parameters
model = ARIMA(train_y, order=model_auto_arima.order)
model_fit = model.fit()

# Forecast future values for the test set
predictions = model_fit.forecast(steps=len(test_y))

# Evaluate the model
rmse = np.sqrt(mean_squared_error(test_y, predictions))
print('Test RMSE:', rmse)

# Plot the results
plt.figure(figsize=(12, 6))
plt.plot(time_series_data['EXECUTION_START_DATE'][:train_size], train_y, label='Training Data', color='blue')
plt.plot(test_dates, test_y, label='Testing Data', color='orange')
plt.plot(test_dates, predictions, label='Predictions', color='red')
plt.legend()
plt.title('ARIMA Time Series Analysis')
plt.xlabel('Date')
plt.ylabel('Number of Maintenance Activities')
plt.show()
Performing stepwise search to minimize aic
 ARIMA(2,1,2)(0,0,0)[0] intercept   : AIC=39752.188, Time=5.22 sec
 ARIMA(0,1,0)(0,0,0)[0] intercept   : AIC=42280.376, Time=0.11 sec
 ARIMA(1,1,0)(0,0,0)[0] intercept   : AIC=42236.805, Time=0.16 sec
 ARIMA(0,1,1)(0,0,0)[0] intercept   : AIC=42116.562, Time=0.93 sec
 ARIMA(0,1,0)(0,0,0)[0]             : AIC=42278.380, Time=0.08 sec
 ARIMA(1,1,2)(0,0,0)[0] intercept   : AIC=40545.764, Time=8.35 sec
 ARIMA(2,1,1)(0,0,0)[0] intercept   : AIC=40146.430, Time=3.16 sec
 ARIMA(3,1,2)(0,0,0)[0] intercept   : AIC=40116.245, Time=9.08 sec
 ARIMA(2,1,3)(0,0,0)[0] intercept   : AIC=inf, Time=9.82 sec
 ARIMA(1,1,1)(0,0,0)[0] intercept   : AIC=41955.414, Time=1.75 sec
 ARIMA(1,1,3)(0,0,0)[0] intercept   : AIC=40552.691, Time=11.12 sec
 ARIMA(3,1,1)(0,0,0)[0] intercept   : AIC=40085.166, Time=5.82 sec
 ARIMA(3,1,3)(0,0,0)[0] intercept   : AIC=inf, Time=13.52 sec
 ARIMA(2,1,2)(0,0,0)[0]             : AIC=39751.330, Time=1.83 sec
 ARIMA(1,1,2)(0,0,0)[0]             : AIC=40547.846, Time=2.00 sec
 ARIMA(2,1,1)(0,0,0)[0]             : AIC=40148.500, Time=3.72 sec
 ARIMA(3,1,2)(0,0,0)[0]             : AIC=inf, Time=5.84 sec
 ARIMA(2,1,3)(0,0,0)[0]             : AIC=inf, Time=5.36 sec
 ARIMA(1,1,1)(0,0,0)[0]             : AIC=41953.417, Time=1.00 sec
 ARIMA(1,1,3)(0,0,0)[0]             : AIC=40523.084, Time=6.46 sec
 ARIMA(3,1,1)(0,0,0)[0]             : AIC=40086.987, Time=2.06 sec
 ARIMA(3,1,3)(0,0,0)[0]             : AIC=39160.040, Time=4.26 sec
 ARIMA(4,1,3)(0,0,0)[0]             : AIC=37997.016, Time=6.70 sec
 ARIMA(4,1,2)(0,0,0)[0]             : AIC=38269.563, Time=7.01 sec
 ARIMA(5,1,3)(0,0,0)[0]             : AIC=37541.256, Time=5.04 sec
 ARIMA(5,1,2)(0,0,0)[0]             : AIC=37659.624, Time=11.61 sec
 ARIMA(5,1,4)(0,0,0)[0]             : AIC=37302.760, Time=17.91 sec
 ARIMA(4,1,4)(0,0,0)[0]             : AIC=inf, Time=14.95 sec
 ARIMA(5,1,5)(0,0,0)[0]             : AIC=inf, Time=12.31 sec
 ARIMA(4,1,5)(0,0,0)[0]             : AIC=37621.240, Time=6.67 sec
 ARIMA(5,1,4)(0,0,0)[0] intercept   : AIC=37306.757, Time=16.53 sec

Best model:  ARIMA(5,1,4)(0,0,0)[0]          
Total fit time: 200.475 seconds
Best model: (5, 1, 4)
/usr/local/lib/python3.10/dist-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  warnings.warn("Maximum Likelihood optimization failed to "
Test RMSE: 210.78868210140388
No description has been provided for this image

The ARIMA time series analysis forecasts future maintenance activity levels based on historical data patterns. The best-fitting model, ARIMA(5,1,4), was chosen based on its AIC score, indicating an optimal balance between complexity and fit. The training data, represented in blue, shows historical trends, while the red line denotes predictions for the testing period. The orange line represents actual testing data, allowing us to observe the model’s accuracy. The Root Mean Squared Error (RMSE) of approximately 210.79 indicates that, on average, predictions are close to observed values, though some variability remains.

This model provides valuable business insights by enabling the organization to anticipate periods of high maintenance activity. Such predictions can aid in planning and resource allocation, facilitating timely preventive/corrective maintenance and minimizing disruptions. For further improvement, the model could incorporate seasonal adjustments or additional factors (like weather or production cycles) to capture influences on maintenance activity more accurately.

In [ ]:
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'])

# Set the datetime column as the index
data_swire_copy.set_index('EXECUTION_START_DATE', inplace=True)

# Resample the data to a time series format
time_series_data = data_swire_copy.resample('D').size()

# Plot the time series
plt.figure(figsize=(12, 6))
plt.plot(time_series_data, label='Number of Maintenance Activities per Day')
plt.title('Time Series of Maintenance Activities')
plt.xlabel('Date')
plt.ylabel('Number of Activities')
plt.legend()
plt.show()

# Fit an ARIMA model
# Use auto_ARIMA to find the best p, d, q parameters
model_auto_arima = auto_arima(time_series_data, seasonal=False, stepwise=True, trace=True)

# Print the best parameters
print(f"Best model: {model_auto_arima.order}")

# Fit the ARIMA model with the best parameters
model = ARIMA(time_series_data, order=model_auto_arima.order)
model_fit = model.fit()

# Print model summary
print(model_fit.summary())

# Forecast future values
forecast = model_fit.forecast(steps=30)  # Forecast for the next 30 days

# Plot the forecast
plt.figure(figsize=(12, 6))
plt.plot(time_series_data, label='Observed')
plt.plot(forecast, label='Forecast', color='red')
plt.title('ARIMA Model Forecast')
plt.xlabel('Date')
plt.ylabel('Number of Activities')
plt.legend()
plt.show()
No description has been provided for this image
Performing stepwise search to minimize aic
 ARIMA(2,1,2)(0,0,0)[0] intercept   : AIC=50417.730, Time=8.34 sec
 ARIMA(0,1,0)(0,0,0)[0] intercept   : AIC=53658.438, Time=0.13 sec
 ARIMA(1,1,0)(0,0,0)[0] intercept   : AIC=53603.376, Time=0.17 sec
 ARIMA(0,1,1)(0,0,0)[0] intercept   : AIC=53456.806, Time=1.22 sec
 ARIMA(0,1,0)(0,0,0)[0]             : AIC=53656.438, Time=0.09 sec
 ARIMA(1,1,2)(0,0,0)[0] intercept   : AIC=51477.106, Time=9.11 sec
 ARIMA(2,1,1)(0,0,0)[0] intercept   : AIC=50957.703, Time=4.48 sec
 ARIMA(3,1,2)(0,0,0)[0] intercept   : AIC=inf, Time=15.35 sec
 ARIMA(2,1,3)(0,0,0)[0] intercept   : AIC=inf, Time=14.48 sec
 ARIMA(1,1,1)(0,0,0)[0] intercept   : AIC=53251.497, Time=3.30 sec
 ARIMA(1,1,3)(0,0,0)[0] intercept   : AIC=51487.928, Time=5.84 sec
 ARIMA(3,1,1)(0,0,0)[0] intercept   : AIC=50867.251, Time=10.44 sec
 ARIMA(3,1,3)(0,0,0)[0] intercept   : AIC=inf, Time=16.24 sec
 ARIMA(2,1,2)(0,0,0)[0]             : AIC=50417.086, Time=2.52 sec
 ARIMA(1,1,2)(0,0,0)[0]             : AIC=51480.764, Time=1.86 sec
 ARIMA(2,1,1)(0,0,0)[0]             : AIC=50961.473, Time=1.43 sec
 ARIMA(3,1,2)(0,0,0)[0]             : AIC=inf, Time=9.94 sec
 ARIMA(2,1,3)(0,0,0)[0]             : AIC=inf, Time=5.22 sec
 ARIMA(1,1,1)(0,0,0)[0]             : AIC=53249.497, Time=0.90 sec
 ARIMA(1,1,3)(0,0,0)[0]             : AIC=51455.468, Time=2.84 sec
 ARIMA(3,1,1)(0,0,0)[0]             : AIC=50870.643, Time=4.31 sec
 ARIMA(3,1,3)(0,0,0)[0]             : AIC=inf, Time=7.38 sec

Best model:  ARIMA(2,1,2)(0,0,0)[0]          
Total fit time: 125.623 seconds
Best model: (2, 1, 2)
/usr/local/lib/python3.10/dist-packages/statsmodels/tsa/statespace/sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
  warn('Non-stationary starting autoregressive parameters'
/usr/local/lib/python3.10/dist-packages/statsmodels/tsa/statespace/sarimax.py:978: UserWarning: Non-invertible starting MA parameters found. Using zeros as starting parameters.
  warn('Non-invertible starting MA parameters found.'
                               SARIMAX Results                                
==============================================================================
Dep. Variable:                      y   No. Observations:                 4030
Model:                 ARIMA(2, 1, 2)   Log Likelihood              -25203.543
Date:                Thu, 07 Nov 2024   AIC                          50417.086
Time:                        22:52:02   BIC                          50448.592
Sample:                    08-19-2013   HQIC                         50428.250
                         - 08-30-2024                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          1.0395      0.012     88.376      0.000       1.016       1.063
ar.L2         -0.7359      0.012    -62.933      0.000      -0.759      -0.713
ma.L1         -1.5455      0.016    -93.977      0.000      -1.578      -1.513
ma.L2          0.5894      0.016     36.272      0.000       0.558       0.621
sigma2      1.587e+04    304.331     52.151      0.000    1.53e+04    1.65e+04
===================================================================================
Ljung-Box (L1) (Q):                   1.76   Jarque-Bera (JB):               291.66
Prob(Q):                              0.18   Prob(JB):                         0.00
Heteroskedasticity (H):              27.34   Skew:                            -0.34
Prob(H) (two-sided):                  0.00   Kurtosis:                         4.13
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
No description has been provided for this image

The time series plot of daily maintenance activities shows a significant increase in maintenance activities over time, particularly after 2016. The number of maintenance activities per day appears to rise and become more variable, suggesting potential operational stress or an increase in asset volume. This could indicate more frequent wear and tear or the addition of more equipment requiring maintenance over the years. The ARIMA model was applied to forecast future maintenance activities.

The ARIMA model, with order (2, 1, 2) based on AIC optimization, generated a 30-day forecast, shown as a red line in the forecast plot. This forecast predicts future maintenance activity levels based on past patterns. The model summary provides statistical insights, with coefficients for autoregressive (AR) and moving average (MA) terms being highly significant (p < 0.001), indicating a good fit. The model's forecast can help maintenance teams anticipate periods of high activity, enabling proactive planning and resource allocation.

11.2 Functional Location Segmentation for Process Analysis ¶

The process involves splitting the "FUNCTIONAL_LOC" column, which contains hierarchical information about each functional location, into six individual columns: "Plant," "Process," "Sub_Process," "Product_Line," "Machine," and "Sub_Machine," allowing each level of the location hierarchy to be accessed independently. Additionally, flag columns ("Plant_Flag," "Process_Flag," etc.) indicate whether each hierarchical level is present ("1") or absent ("0") in each row. This structure supports a detailed analysis of maintenance trends by enabling insights into how different location levels (such as specific plants or machines) contribute to maintenance needs or downtime, facilitating a more targeted approach to data exploration and predictive modeling.

In [ ]:
# Splitting the 'FUNCTIONAL_LOC' column into separate node columns
data_swire_copy[['Plant', 'Process', 'Sub_Process', 'Product_Line', 'Machine', 'Sub_Machine']] = data_swire_copy['FUNCTIONAL_LOC'].str.split('-', expand=True)

# Display the first few rows to check the split
print(data_swire_copy[['FUNCTIONAL_LOC', 'Plant', 'Process', 'Sub_Process', 'Product_Line', 'Machine', 'Sub_Machine']].head())

# Creating flag columns to indicate the presence of each node
# If a column has a value, it is flagged as 1 (present), otherwise 0 (absent)
data_swire_copy['Plant_Flag'] = data_swire_copy['Plant'].notnull().astype(int)
data_swire_copy['Process_Flag'] = data_swire_copy['Process'].notnull().astype(int)
data_swire_copy['Sub_Process_Flag'] = data_swire_copy['Sub_Process'].notnull().astype(int)
data_swire_copy['Product_Line_Flag'] = data_swire_copy['Product_Line'].notnull().astype(int)
data_swire_copy['Machine_Flag'] = data_swire_copy['Machine'].notnull().astype(int)
data_swire_copy['Sub_Machine_Flag'] = data_swire_copy['Sub_Machine'].notnull().astype(int)

# Display the updated DataFrame structure
print("\nUpdated DataFrame with flags:")
print(data_swire_copy[['Plant', 'Plant_Flag', 'Process', 'Process_Flag', 'Sub_Process', 'Sub_Process_Flag']].head())
              FUNCTIONAL_LOC Plant Process Sub_Process Product_Line Machine  \
0  G812-PRD-L15-L03-L030-020  G812     PRD         L15          L03    L030   
1  G812-PRD-L15-L03-L030-020  G812     PRD         L15          L03    L030   
2  G812-PRD-L15-L03-L030-020  G812     PRD         L15          L03    L030   
3  G812-PRD-L15-L03-L030-020  G812     PRD         L15          L03    L030   
4               G291-PRD-P05  G291     PRD         P05         None    None   

  Sub_Machine  
0         020  
1         020  
2         020  
3         020  
4        None  

Updated DataFrame with flags:
  Plant  Plant_Flag Process  Process_Flag Sub_Process  Sub_Process_Flag
0  G812           1     PRD             1         L15                 1
1  G812           1     PRD             1         L15                 1
2  G812           1     PRD             1         L15                 1
3  G812           1     PRD             1         L15                 1
4  G291           1     PRD             1         P05                 1
In [ ]:
# Checking unique values in 'MAINTENANCE_ACTIVITY_TYPE' to understand the categories
print("Unique values in 'MAINTENANCE_ACTIVITY_TYPE':")
print(data_swire_copy['MAINTENANCE_ACTIVITY_TYPE'].unique())

# Assuming that 'Planned' and 'Unplanned' are identifiable in the 'MAINTENANCE_ACTIVITY_TYPE' column
# Replace 'Planned' and 'Unplanned' with the actual values from the dataset that represent these activities

# Filtering planned maintenance data
planned_data = data_swire_copy[data_swire_copy['MAINTENANCE_ACTIVITY_TYPE'].str.contains('Planned', case=False, na=False)]

# Filtering unplanned maintenance data
unplanned_data = data_swire_copy[data_swire_copy['MAINTENANCE_ACTIVITY_TYPE'].str.contains('Unplanned', case=False, na=False)]

# Display the counts for verification
print(f"Number of unplanned maintenance records: {len(unplanned_data)}")

# Display first few rows of unplanned dataset for confirmation
print("\nFirst few rows of unplanned data:")
print(unplanned_data.head())
Unique values in 'MAINTENANCE_ACTIVITY_TYPE':
['Unplanned' 'Planned']
Number of unplanned maintenance records: 1276877

First few rows of unplanned data:
    ORDER_ID PLANT_ID PRODUCTION_LOCATION EXECUTION_START_DATE  \
0  705642457     G812                ROMA           2024-05-04   
1  704191697     G812                ROMA           2022-09-13   
2  704466547     G812                ROMA           2022-12-21   
3  703834477     G812                ROMA           2022-07-04   
4  704661125     G291               MONZA           2023-03-15   

  EXECUTION_FINISH_DATE ACTUAL_START_TIME ACTUAL_FINISH_TIME  \
0            2024-05-12      06:00:00.000       23:04:08.000   
1            2022-09-13      06:00:00.000       17:17:24.000   
2            2022-12-21      07:00:00.000       07:00:00.000   
3            2022-07-04      06:00:00.000       06:00:00.000   
4            2023-03-15      07:00:00.000       07:00:00.000   

   ACTUAL_WORK_IN_MINUTES MAINTENANCE_PLAN  MAINTENANCE_ITEM  ...  \
0                   390.0              NaN               NaN  ...   
1                   420.0              NaN               NaN  ...   
2                    30.0              NaN               NaN  ...   
3                    60.0              NaN               NaN  ...   
4                   120.0              NaN               NaN  ...   

  FUNCTIONAL_AREA_NODE_2_MODIFIED    FUNCTIONAL_AREA_NODE_3_MODIFIED  \
0      G812 PRD FILLER_ROTARY_CAN  G812 PRD FILLER_ROTARY_CAN LINE 3   
1      G812 PRD FILLER_ROTARY_CAN  G812 PRD FILLER_ROTARY_CAN LINE 3   
2      G812 PRD FILLER_ROTARY_CAN  G812 PRD FILLER_ROTARY_CAN LINE 3   
3      G812 PRD FILLER_ROTARY_CAN  G812 PRD FILLER_ROTARY_CAN LINE 3   
4                     AIR SYSTEMS                                NaN   

  FUNCTIONAL_AREA_NODE_4_MODIFIED FUNCTIONAL_AREA_NODE_5_MODIFIED  \
0                        CONVEYOR               FULL CAN CONVEYOR   
1                        CONVEYOR               FULL CAN CONVEYOR   
2                        CONVEYOR               FULL CAN CONVEYOR   
3                        CONVEYOR               FULL CAN CONVEYOR   
4                             NaN                             NaN   

  EQUIPMENT_ID             EQUIPMENT_DESC EQUIP_CAT_DESC EQUIP_START_UP_DATE  \
0  300126812.0  FULL CAN CONVEYOR (FC41B)       Machines          2020-04-22   
1  300126812.0  FULL CAN CONVEYOR (FC41B)       Machines          2020-04-22   
2  300126812.0  FULL CAN CONVEYOR (FC41B)       Machines          2020-04-22   
3  300126812.0  FULL CAN CONVEYOR (FC41B)       Machines          2020-04-22   
4  300001088.0          TANK_STL_STOR_AIR       Machines          2017-02-15   

  EQUIP_VALID_FROM  EQUIP_VALID_TO  
0       2020-02-10      9999-12-31  
1       2020-02-10      9999-12-31  
2       2020-02-10      9999-12-31  
3       2020-02-10      9999-12-31  
4       2023-12-11      9999-12-31  

[5 rows x 25 columns]

The output shows that there are 1,276,877 records of unplanned maintenance activities in the dataset. By filtering and examining the initial rows of these unplanned activities, we confirm the dataset's structure, which allows for focused analysis on unplanned maintenance trends and patterns.

11.3 Kaplan-Meier Model ¶

In [ ]:
# Filter for unplanned maintenance data
df_unplanned = data_swire_copy[data_swire_copy['MAINTENANCE_ACTIVITY_TYPE'].str.contains('Unplanned', case=False, na=False)]

# Filter rows where parts have been replaced based on the 'ORDER_DESCRIPTION' column
replace_rows = df_unplanned[df_unplanned['ORDER_DESCRIPTION'].str.contains('replace', case=False, na=False)]

# Convert the 'EXECUTION_START_DATE' column to datetime format
replace_rows['EXECUTION_START_DATE'] = pd.to_datetime(replace_rows['EXECUTION_START_DATE'], errors='coerce')

# Group by 'MAINTENANCE_TYPE_DESCRIPTION' to understand counts of replacements
grouped_replace = replace_rows.groupby('MAINTENANCE_TYPE_DESCRIPTION').size()
print("Counts of parts replaced by maintenance type:\n", grouped_replace)

# Calculate the new duration using the fixed future date
fixed_future_date = replace_rows['EXECUTION_START_DATE'].max()
replace_rows['duration_new'] = (fixed_future_date - replace_rows['EXECUTION_START_DATE']).dt.days

# Create an 'event_new' column indicating unplanned maintenance occurrence (1 for event)
replace_rows['event_new'] = 1  # Since replace_rows only has rows with unplanned events

# Initialize the Kaplan-Meier fitter
kmf = KaplanMeierFitter()

# Dictionary to store median survival times for each maintenance type
median_survival_times = {}

# List of key maintenance types for analysis
maintenance_types = [
    'Corrective Maintenance Order',
    'Administrative Order',
    'Preventive Maintenance Order'
]

# Plot Kaplan-Meier survival curves for each type of unplanned maintenance
plt.figure(figsize=(12, 8))

# Loop over each maintenance type and fit the Kaplan-Meier model
for maintenance_type in maintenance_types:
    if maintenance_type in replace_rows['MAINTENANCE_TYPE_DESCRIPTION'].unique():
        # Filter data for the current maintenance type
        type_df = replace_rows[replace_rows['MAINTENANCE_TYPE_DESCRIPTION'] == maintenance_type]

        # Fit the Kaplan-Meier model with duration and event data
        kmf.fit(durations=type_df['duration_new'], event_observed=type_df['event_new'], label=maintenance_type)

        # Store and print the median survival time for each type
        median_survival_times[maintenance_type] = kmf.median_survival_time_
        print(f"Median Survival Time for {maintenance_type}: {kmf.median_survival_time_} days")

        # Plot the survival function for this maintenance type
        kmf.plot_survival_function(ci_show=False)

# Label the plot for better visualization
plt.title("Kaplan-Meier Survival Curves for Different Maintenance Types (New Duration)")
plt.xlabel("Days Since Part Replacement")
plt.ylabel("Survival Probability")
plt.legend()
plt.grid(True)
plt.show()

# Output the calculated median survival times
print("\nMedian Survival Times (in days):")
for maintenance_type, median_time in median_survival_times.items():
    print(f"{maintenance_type}: {median_time} days")
<ipython-input-53-513457c13a07>:12: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  replace_rows['EXECUTION_START_DATE'] = pd.to_datetime(replace_rows['EXECUTION_START_DATE'], errors='coerce')
<ipython-input-53-513457c13a07>:24: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  replace_rows['duration_new'] = (fixed_future_date - replace_rows['EXECUTION_START_DATE']).dt.days
<ipython-input-53-513457c13a07>:27: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  replace_rows['event_new'] = 1  # Since replace_rows only has rows with unplanned events
Counts of parts replaced by maintenance type:
 MAINTENANCE_TYPE_DESCRIPTION
Administrative Order              266
Corrective Maintenance Order    23060
Preventive Maintenance Order       67
dtype: int64
Median Survival Time for Corrective Maintenance Order: 1231.0 days
Median Survival Time for Administrative Order: 364.0 days
Median Survival Time for Preventive Maintenance Order: 1288.0 days
No description has been provided for this image
Median Survival Times (in days):
Corrective Maintenance Order: 1231.0 days
Administrative Order: 364.0 days
Preventive Maintenance Order: 1288.0 days

The Kaplan-Meier survival analysis for different unplanned maintenance types provides insights into the longevity of parts after replacement. "Corrective Maintenance Orders" show a median survival time of 1,231 days, indicating that parts replaced due to corrective issues tend to last over three years on average before needing further maintenance. "Preventive Maintenance Orders" exhibit a slightly longer median survival time of 1,288 days, suggesting that proactively replaced parts have a marginally higher longevity, supporting the effectiveness of preventive maintenance strategies. In contrast, "Administrative Orders" have a much shorter median survival time of 364 days, reflecting that parts replaced for administrative reasons are more likely to require additional interventions within a year. These insights underscore the durability advantage of preventive maintenance over corrective or administrative actions, highlighting the value of a proactive approach to extend part life and minimize disruptions.

11.4 Cox Proportional Model ¶

In [ ]:
# Filter for unplanned maintenance data
df_unplanned = data_swire_copy[data_swire_copy['MAINTENANCE_ACTIVITY_TYPE'].str.contains('Unplanned', case=False, na=False)]

# Filter rows where parts have been replaced based on the 'ORDER_DESCRIPTION' column
replace_rows = df_unplanned[df_unplanned['ORDER_DESCRIPTION'].str.contains('replace', case=False, na=False)]

# Convert 'EXECUTION_START_DATE' to datetime format
replace_rows['EXECUTION_START_DATE'] = pd.to_datetime(replace_rows['EXECUTION_START_DATE'], errors='coerce')

# Drop rows with NaT in 'EXECUTION_START_DATE'
replace_rows = replace_rows.dropna(subset=['EXECUTION_START_DATE'])

# Calculate the new duration using the fixed future date
fixed_future_date = replace_rows['EXECUTION_START_DATE'].max()
replace_rows['duration_new'] = (fixed_future_date - replace_rows['EXECUTION_START_DATE']).dt.days

# Create an 'event_new' column indicating unplanned maintenance occurrence (1 for event)
replace_rows['event_new'] = 1  # All rows in 'replace_rows' are events

# Encode categorical variables as dummies for Cox model
cox_data = replace_rows[['duration_new', 'event_new', 'MAINTENANCE_TYPE_DESCRIPTION']]
cox_data = pd.get_dummies(cox_data, drop_first=True)

# Fit the Cox Proportional Hazards model
cph = CoxPHFitter()
cph.fit(cox_data, duration_col='duration_new', event_col='event_new')

# Print the summary of the Cox model
cph.print_summary()

# Print columns in Cox model data to identify dummy variable names
print("\nColumns in Cox model data:", cox_data.columns)

# Extract the dummy variable names for plotting
dummy_names = cox_data.columns[cox_data.columns.str.startswith('MAINTENANCE_TYPE_DESCRIPTION')]

# Plot the partial effects for the dummy variables
plt.figure(figsize=(12, 8))
for dummy_name in dummy_names:
    cph.plot_partial_effects_on_outcome(dummy_name, values=[0, 1], plot_baseline=False)

plt.title("Cox Proportional Hazards Model - Effect of Maintenance Types")
plt.xlabel("Days")
plt.ylabel("Survival Probability")
plt.grid(True)
plt.show()

# Output the summary of median survival times
print("\nMedian survival times and impacts have been evaluated based on Cox PH analysis.")


# Calculate the median survival times for different maintenance types
median_survival_times_cox = {}

for dummy_name in dummy_names:
    # Create a DataFrame to predict median survival with a specific covariate set to 1
    profile_df = pd.DataFrame(0, index=[0], columns=cox_data.columns)
    profile_df[dummy_name] = 1  # Set the specific maintenance type dummy to 1

    # Predict median survival time for the given profile
    median_time = cph.predict_median(profile_df)
    median_survival_times_cox[dummy_name] = median_time
    print(f"Median Survival Time for {dummy_name}: {median_time} days")

# Output the calculated median survival times
print("\nMedian Survival Times (in days) from Cox Model:")
for maintenance_type, median_time in median_survival_times_cox.items():
    print(f"{maintenance_type}: {median_time:.2f} days")
<ipython-input-54-5854a0746f36>:12: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  replace_rows['EXECUTION_START_DATE'] = pd.to_datetime(replace_rows['EXECUTION_START_DATE'], errors='coerce')
model lifelines.CoxPHFitter
duration col 'duration_new'
event col 'event_new'
baseline estimation breslow
number of observations 23393
number of events observed 23393
partial log-likelihood -211783.00
time fit was run 2024-11-07 22:57:58 UTC
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95% cmp to z p -log2(p)
MAINTENANCE_TYPE_DESCRIPTION_Corrective Maintenance Order -1.39 0.25 0.06 -1.51 -1.26 0.22 0.28 0.00 -22.31 <0.005 363.93
MAINTENANCE_TYPE_DESCRIPTION_Preventive Maintenance Order -1.52 0.22 0.14 -1.79 -1.25 0.17 0.29 0.00 -11.10 <0.005 92.75

Concordance 0.51
Partial AIC 423569.99
log-likelihood ratio test 336.05 on 2 df
-log2(p) of ll-ratio test 242.41
Columns in Cox model data: Index(['duration_new', 'event_new',
       'MAINTENANCE_TYPE_DESCRIPTION_Corrective Maintenance Order',
       'MAINTENANCE_TYPE_DESCRIPTION_Preventive Maintenance Order'],
      dtype='object')
<Figure size 1200x800 with 0 Axes>
No description has been provided for this image
No description has been provided for this image
Median survival times and impacts have been evaluated based on Cox PH analysis.
Median Survival Time for MAINTENANCE_TYPE_DESCRIPTION_Corrective Maintenance Order: 1232.0 days
Median Survival Time for MAINTENANCE_TYPE_DESCRIPTION_Preventive Maintenance Order: 1352.0 days

Median Survival Times (in days) from Cox Model:
MAINTENANCE_TYPE_DESCRIPTION_Corrective Maintenance Order: 1232.00 days
MAINTENANCE_TYPE_DESCRIPTION_Preventive Maintenance Order: 1352.00 days

The Cox Proportional Hazards model analysis provides insights into the impact of different maintenance types on the survival time of parts replaced during unplanned maintenance events. Specifically, the "Corrective Maintenance Order" and "Preventive Maintenance Order" maintenance types were evaluated to determine their influence on the duration until the next maintenance activity. The negative coefficients for both "Corrective Maintenance Order" (-1.39) and "Preventive Maintenance Order" (-1.52) suggest that these maintenance types significantly reduce the hazard rate, thereby extending the expected survival time between replacements. In terms of median survival times, parts under "Corrective Maintenance Order" have an expected median survival of approximately 1232 days, while those under "Preventive Maintenance Order" are expected to last about 1352 days. This implies that preventive measures contribute to a slightly longer duration between replacements compared to corrective actions, underscoring the potential value of preventive maintenance in minimizing downtime and extending part longevity. The survival curves for these maintenance types further illustrate these differences, with preventive maintenance demonstrating a slower decline in survival probability over time compared to corrective maintenance.

11.5 Leveraging Kaplan-Meier Model for Enhanced Insights ¶

In [ ]:
# Fixed future date for censoring purposes
fixed_future_date = pd.to_datetime("2024-08-30")

# Calculate the duration to the fixed future date
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')
data_swire_copy = data_swire_copy.dropna(subset=['EXECUTION_START_DATE'])  # Drop rows with NaT in EXECUTION_START_DATE
data_swire_copy['duration_new'] = (fixed_future_date - data_swire_copy['EXECUTION_START_DATE']).dt.days
data_swire_copy['event_new'] = 1  # Assuming all rows represent events

# Dictionary to store median survival times for each level
median_survival_times_by_level = {}

# List of hierarchical columns for analysis
hierarchical_levels = ['Plant', 'Process', 'Sub_Process', 'Product_Line', 'Machine', 'Sub_Machine']

# Loop through each level to calculate Kaplan-Meier curves and store median survival times
for level in hierarchical_levels:
    kmf = KaplanMeierFitter()
    median_survival_times_by_level[level] = {}

    for group in data_swire_copy[level].dropna().unique():  # Drop NaN and iterate over unique groups
        group_df = data_swire_copy[data_swire_copy[level] == group]

        # Fit Kaplan-Meier model for each group
        kmf.fit(durations=group_df['duration_new'], event_observed=group_df['event_new'], label=str(group))

        # Store median survival time for each group in the current level
        median_survival_times_by_level[level][group] = kmf.median_survival_time_

# Display top 10 and lowest 10 median survival times for each level
for level in hierarchical_levels:
    print(f"\nTop 10 Median Survival Times for {level} (in days):")
    top_10 = sorted(median_survival_times_by_level[level].items(), key=lambda x: x[1], reverse=True)[:10]
    for group, median_time in top_10:
        print(f"{level} - {group}: {median_time} days")

    print(f"\nLowest 10 Median Survival Times for {level} (in days):")
    low_10 = sorted(median_survival_times_by_level[level].items(), key=lambda x: x[1])[:10]
    for group, median_time in low_10:
        print(f"{level} - {group}: {median_time} days")

    # Plot the survival curves for the top 10 and lowest 10 groups
    plt.figure(figsize=(14, 8))
    kmf = KaplanMeierFitter()

    # Plot the top 10 groups
    for group, _ in top_10:
        group_df = data_swire_copy[data_swire_copy[level] == group]
        kmf.fit(durations=group_df['duration_new'], event_observed=group_df['event_new'], label=f"{level} - {group}")
        kmf.plot_survival_function(ci_show=False)

    plt.title(f"Kaplan-Meier Survival Curves for Top 10 {level} Groups by Median Survival Time")
    plt.xlabel("Days Since Last Maintenance")
    plt.ylabel("Survival Probability")
    plt.legend(title='Groups')
    plt.grid(True)
    plt.show()

    plt.figure(figsize=(14, 8))

    # Plot the lowest 10 groups
    for group, _ in low_10:
        group_df = data_swire_copy[data_swire_copy[level] == group]
        kmf.fit(durations=group_df['duration_new'], event_observed=group_df['event_new'], label=f"{level} - {group}")
        kmf.plot_survival_function(ci_show=False)

    plt.title(f"Kaplan-Meier Survival Curves for Lowest 10 {level} Groups by Median Survival Time")
    plt.xlabel("Days Since Last Maintenance")
    plt.ylabel("Survival Probability")
    plt.legend(title='Groups')
    plt.grid(True)
    plt.show()
Top 10 Median Survival Times for Plant (in days):
Plant - G221: 1505.0 days
Plant - G291: 1387.0 days
Plant - G261: 1244.0 days
Plant - G816: 1143.0 days
Plant - G811: 821.0 days
Plant - G812: 772.0 days
Plant - G999: 123.0 days

Lowest 10 Median Survival Times for Plant (in days):
Plant - G999: 123.0 days
Plant - G812: 772.0 days
Plant - G811: 821.0 days
Plant - G816: 1143.0 days
Plant - G261: 1244.0 days
Plant - G291: 1387.0 days
Plant - G221: 1505.0 days
No description has been provided for this image
No description has been provided for this image
Top 10 Median Survival Times for Process (in days):
Process - CLR: 1418.0 days
Process - FLT: 1376.0 days
Process - PRD: 1198.0 days
Process - SLC: 123.0 days

Lowest 10 Median Survival Times for Process (in days):
Process - SLC: 123.0 days
Process - PRD: 1198.0 days
Process - FLT: 1376.0 days
Process - CLR: 1418.0 days
No description has been provided for this image
No description has been provided for this image
Top 10 Median Survival Times for Sub_Process (in days):
Sub_Process - P30: 2176.0 days
Sub_Process - L35: 1590.0 days
Sub_Process - A50: 1575.0 days
Sub_Process - P05: 1483.0 days
Sub_Process - A85: 1418.0 days
Sub_Process - F10: 1376.0 days
Sub_Process - P90: 1362.0 days
Sub_Process - A99: 1362.0 days
Sub_Process - P55: 1354.0 days
Sub_Process - L30: 1304.0 days

Lowest 10 Median Survival Times for Sub_Process (in days):
Sub_Process - P76: 143.0 days
Sub_Process - P95: 204.0 days
Sub_Process - B30: 230.0 days
Sub_Process - A40: 257.0 days
Sub_Process - A05: 267.0 days
Sub_Process - P50: 294.0 days
Sub_Process - A55: 339.0 days
Sub_Process - P91: 342.0 days
Sub_Process - P77: 373.0 days
Sub_Process - A28: 427.0 days
No description has been provided for this image
No description has been provided for this image
Top 10 Median Survival Times for Product_Line (in days):
Product_Line - L08: 1637.0 days
Product_Line - 006: 1585.0 days
Product_Line - I06: 1575.0 days
Product_Line - E06: 1412.0 days
Product_Line - 002: 1376.0 days
Product_Line - L04: 1329.0 days
Product_Line - L01: 1300.0 days
Product_Line - L02: 1293.0 days
Product_Line - S02: 1277.0 days
Product_Line - L09: 1256.0 days

Lowest 10 Median Survival Times for Product_Line (in days):
Product_Line - L10: 21.0 days
Product_Line - S04: 42.0 days
Product_Line - W15: 61.0 days
Product_Line - H00: 120.0 days
Product_Line - E01: 121.0 days
Product_Line - I08: 123.0 days
Product_Line - 020: 123.0 days
Product_Line - W01: 130.0 days
Product_Line - E03: 141.0 days
Product_Line - W09: 165.0 days
No description has been provided for this image
No description has been provided for this image
Top 10 Median Survival Times for Machine (in days):
Machine - S250: 2176.0 days
Machine - S280: 1727.0 days
Machine - S090: 1695.0 days
Machine - L160: 1681.0 days
Machine - 0007: 1638.0 days
Machine - L010: 1616.0 days
Machine - L170: 1592.0 days
Machine - L070: 1576.0 days
Machine - S080: 1552.0 days
Machine - L100: 1525.0 days

Lowest 10 Median Survival Times for Machine (in days):
Machine - S005: 4.0 days
Machine - C030: 5.0 days
Machine - C040: 43.0 days
Machine - S002: 45.0 days
Machine - L230: 59.0 days
Machine - S004: 75.0 days
Machine - S000: 89.0 days
Machine - S030: 97.0 days
Machine - L220: 108.0 days
Machine - S001: 117.0 days
No description has been provided for this image
No description has been provided for this image
Top 10 Median Survival Times for Sub_Machine (in days):
Sub_Machine - WEST: 1630.0 days
Sub_Machine - L010: 1234.0 days
Sub_Machine - 0020: 1196.0 days
Sub_Machine - 0040: 1169.0 days
Sub_Machine - 0035: 1123.0 days
Sub_Machine - 0030: 1001.0 days
Sub_Machine - 0050: 924.0 days
Sub_Machine - 0010: 857.0 days
Sub_Machine - 0060: 830.0 days
Sub_Machine - L040: 745.0 days

Lowest 10 Median Survival Times for Sub_Machine (in days):
Sub_Machine - 050: 34.0 days
Sub_Machine - INS: 129.0 days
Sub_Machine - ACMI: 169.0 days
Sub_Machine - 0025: 170.0 days
Sub_Machine - 0200: 174.0 days
Sub_Machine - MEAD: 178.0 days
Sub_Machine - OUT: 304.0 days
Sub_Machine - 010: 360.0 days
Sub_Machine - 0022: 371.0 days
Sub_Machine - 040: 402.0 days
No description has been provided for this image
No description has been provided for this image

The Kaplan-Meier survival analysis provides insights into the median survival times for various maintenance hierarchy levels in terms of days since the last maintenance. For "Plant" groups, the median survival times range from 123 days for "Plant - G999" to 1505 days for "Plant - G221," indicating substantial variability in survival across plants. In the "Process" level, "Process - CLR" has the highest median survival of 1418 days, while "Process - SLC" has the lowest at 123 days, suggesting certain processes may require more frequent maintenance than others.

The "Sub_Process" level shows median survival times from 143 days for "Sub_Process - P76" to 2176 days for "Sub_Process - P30," illustrating variability in maintenance needs across sub-processes. Similarly, for "Product_Line," median survival times vary widely, with "Product_Line - L08" showing the longest survival at 1637 days and "Product_Line - L10" the shortest at just 21 days. At the "Machine" level, "Machine - S250" has the longest survival time of 2176 days, contrasting sharply with "Machine - S005" at just 4 days. Finally, "Sub_Machine" median survival times range from 34 days for "Sub_Machine - 050" to 1630 days for "Sub_Machine - WEST." These findings highlight that specific plants, processes, product lines, and machines differ significantly in maintenance durability, suggesting targeted maintenance strategies may optimize overall system longevity.

In [ ]:
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')
data_swire_copy = data_swire_copy.dropna(subset=['EXECUTION_START_DATE'])  # Drop rows with NaT in EXECUTION_START_DATE

# Dictionary to store median survival times for each level
median_survival_times_by_level = {}

# List of hierarchical columns for analysis
hierarchical_levels = ['Plant', 'Process', 'Sub_Process', 'Product_Line', 'Machine', 'Sub_Machine']

# Loop through each level to calculate Kaplan-Meier curves and store median survival times
for level in hierarchical_levels:
    kmf = KaplanMeierFitter()
    median_survival_times_by_level[level] = {}

    # Find the latest maintenance date for each group within the level
    last_dates = data_swire_copy.groupby(level)['EXECUTION_START_DATE'].max()

    # Calculate duration for each row by subtracting EXECUTION_START_DATE from the latest date for each group
    data_swire_copy['duration_new'] = data_swire_copy.apply(
        lambda row: (last_dates[row[level]] - row['EXECUTION_START_DATE']).days
        if row[level] in last_dates else np.nan,
        axis=1
    )

    # Create an 'event_new' column indicating unplanned maintenance occurrence (1 for event)
    data_swire_copy['event_new'] = 1  # Assuming all rows represent events

    # Loop over each group in the current hierarchical level to fit the Kaplan-Meier model
    for group in data_swire_copy[level].dropna().unique():  # Drop NaN and iterate over unique groups
        group_df = data_swire_copy[data_swire_copy[level] == group]

        # Fit Kaplan-Meier model for each group
        kmf.fit(durations=group_df['duration_new'], event_observed=group_df['event_new'], label=str(group))

        # Store median survival time for each group in the current level
        median_survival_times_by_level[level][group] = kmf.median_survival_time_

    # Display top 10 and lowest 10 median survival times for each level
    print(f"\nTop 10 Median Survival Times for {level} (in days):")
    top_10 = sorted(median_survival_times_by_level[level].items(), key=lambda x: x[1], reverse=True)[:10]
    for group, median_time in top_10:
        print(f"{level} - {group}: {median_time} days")

    print(f"\nLowest 10 Median Survival Times for {level} (in days):")
    low_10 = sorted(median_survival_times_by_level[level].items(), key=lambda x: x[1])[:10]
    for group, median_time in low_10:
        print(f"{level} - {group}: {median_time} days")

    # Plot the survival curves for the top 10 and lowest 10 groups
    plt.figure(figsize=(14, 8))
    kmf = KaplanMeierFitter()

    # Plot the top 10 groups
    for group, _ in top_10:
        group_df = data_swire_copy[data_swire_copy[level] == group]
        kmf.fit(durations=group_df['duration_new'], event_observed=group_df['event_new'], label=f"{level} - {group}")
        kmf.plot_survival_function(ci_show=False)

    plt.title(f"Kaplan-Meier Survival Curves for Top 10 {level} Groups by Median Survival Time")
    plt.xlabel("Days Since Last Maintenance")
    plt.ylabel("Survival Probability")
    plt.legend(title='Groups')
    plt.grid(True)
    plt.show()

    plt.figure(figsize=(14, 8))

    # Plot the lowest 10 groups
    for group, _ in low_10:
        group_df = data_swire_copy[data_swire_copy[level] == group]
        kmf.fit(durations=group_df['duration_new'], event_observed=group_df['event_new'], label=f"{level} - {group}")
        kmf.plot_survival_function(ci_show=False)

    plt.title(f"Kaplan-Meier Survival Curves for Lowest 10 {level} Groups by Median Survival Time")
    plt.xlabel("Days Since Last Maintenance")
    plt.ylabel("Survival Probability")
    plt.legend(title='Groups')
    plt.grid(True)
    plt.show()
Top 10 Median Survival Times for Plant (in days):
Plant - G221: 1504.0 days
Plant - G291: 1386.0 days
Plant - G261: 1243.0 days
Plant - G816: 1142.0 days
Plant - G811: 820.0 days
Plant - G812: 771.0 days
Plant - G999: 121.0 days

Lowest 10 Median Survival Times for Plant (in days):
Plant - G999: 121.0 days
Plant - G812: 771.0 days
Plant - G811: 820.0 days
Plant - G816: 1142.0 days
Plant - G261: 1243.0 days
Plant - G291: 1386.0 days
Plant - G221: 1504.0 days
No description has been provided for this image
No description has been provided for this image
Top 10 Median Survival Times for Process (in days):
Process - CLR: 1417.0 days
Process - PRD: 1197.0 days
Process - FLT: 907.0 days
Process - SLC: 121.0 days

Lowest 10 Median Survival Times for Process (in days):
Process - SLC: 121.0 days
Process - FLT: 907.0 days
Process - PRD: 1197.0 days
Process - CLR: 1417.0 days
No description has been provided for this image
No description has been provided for this image
Top 10 Median Survival Times for Sub_Process (in days):
Sub_Process - P30: 2141.0 days
Sub_Process - A50: 1573.0 days
Sub_Process - P05: 1481.0 days
Sub_Process - A85: 1417.0 days
Sub_Process - P90: 1360.0 days
Sub_Process - L35: 1349.0 days
Sub_Process - P55: 1336.0 days
Sub_Process - A99: 1336.0 days
Sub_Process - L30: 1303.0 days
Sub_Process - P85: 1284.0 days

Lowest 10 Median Survival Times for Sub_Process (in days):
Sub_Process - P60: 0.0 days
Sub_Process - P95: 0.0 days
Sub_Process - PRD: 0.0 days
Sub_Process - P76: 136.0 days
Sub_Process - B30: 229.0 days
Sub_Process - A40: 255.0 days
Sub_Process - A05: 264.0 days
Sub_Process - P50: 287.0 days
Sub_Process - A55: 338.0 days
Sub_Process - P91: 341.0 days
No description has been provided for this image
No description has been provided for this image
Top 10 Median Survival Times for Product_Line (in days):
Product_Line - I06: 1554.0 days
Product_Line - E06: 1411.0 days
Product_Line - L08: 1396.0 days
Product_Line - L04: 1328.0 days
Product_Line - L01: 1299.0 days
Product_Line - L02: 1292.0 days
Product_Line - L09: 1255.0 days
Product_Line - S02: 1247.0 days
Product_Line - R03: 1232.0 days
Product_Line - XXX: 1206.0 days

Lowest 10 Median Survival Times for Product_Line (in days):
Product_Line - S04: 0.0 days
Product_Line - 006: 0.0 days
Product_Line - E05: 0.0 days
Product_Line - E01: 2.0 days
Product_Line - L10: 20.0 days
Product_Line - W15: 57.0 days
Product_Line - W11: 64.0 days
Product_Line - W01: 66.0 days
Product_Line - H00: 117.0 days
Product_Line - I08: 119.0 days
No description has been provided for this image
No description has been provided for this image
Top 10 Median Survival Times for Machine (in days):
Machine - S250: 2141.0 days
Machine - L160: 1680.0 days
Machine - S090: 1669.0 days
Machine - L010: 1615.0 days
Machine - L170: 1591.0 days
Machine - L070: 1574.0 days
Machine - S080: 1549.0 days
Machine - L100: 1524.0 days
Machine - 0003: 1446.0 days
Machine - S280: 1426.0 days

Lowest 10 Median Survival Times for Machine (in days):
Machine - S002: 0.0 days
Machine - S005: 0.0 days
Machine - S050: 0.0 days
Machine - C040: 0.0 days
Machine - C030: 0.0 days
Machine - L230: 0.0 days
Machine - L00: 14.0 days
Machine - S030: 19.0 days
Machine - S010: 64.0 days
Machine - S004: 68.0 days
No description has been provided for this image
No description has been provided for this image
Top 10 Median Survival Times for Sub_Machine (in days):
Sub_Machine - 0020: 1191.0 days
Sub_Machine - WEST: 1005.0 days
Sub_Machine - 0035: 997.0 days
Sub_Machine - 0030: 969.0 days
Sub_Machine - 0050: 915.0 days
Sub_Machine - 0010: 846.0 days
Sub_Machine - L010: 744.0 days
Sub_Machine - L040: 731.0 days
Sub_Machine - L030: 705.0 days
Sub_Machine - 0040: 692.0 days

Lowest 10 Median Survival Times for Sub_Machine (in days):
Sub_Machine - B010: 0.0 days
Sub_Machine - B111: 0.0 days
Sub_Machine - 050: 0.0 days
Sub_Machine - B112: 8.0 days
Sub_Machine - 0200: 95.0 days
Sub_Machine - OUT: 104.0 days
Sub_Machine - INS: 120.0 days
Sub_Machine - ACMI: 161.0 days
Sub_Machine - 0025: 162.0 days
Sub_Machine - 030: 175.0 days
No description has been provided for this image
No description has been provided for this image

The Kaplan-Meier survival analysis reveals significant differences in maintenance survival times across hierarchical levels within the dataset. At the "Plant" level, the longest median survival time is observed for "Plant - G221" at 1504 days, while "Plant - G999" has the shortest at 121 days, suggesting variability in maintenance needs among plants. The "Process" level similarly shows variation, with "Process - CLR" having the longest median survival at 1417 days and "Process - SLC" the shortest at 121 days, indicating certain processes may be more maintenance-intensive.

For "Sub_Process," survival times range from 2141 days for "Sub_Process - P30" to 0 days for several processes, including "Sub_Process - P60" and "P95," implying these sub-processes may require immediate or frequent maintenance. At the "Product_Line" level, "Product_Line - I06" shows a high median survival of 1554 days, contrasting sharply with "Product_Line - S04," which has a survival of 0 days. This suggests some product lines are more durable than others. The "Machine" level has "Machine - S250" with the longest survival at 2141 days, while several machines, including "S002" and "S005," have 0 days, indicating they may be prone to immediate breakdown or frequent servicing.

Finally, for "Sub_Machine," "Sub_Machine - 0020" has the longest survival at 1191 days, while others like "B010" and "B111" have 0 days, highlighting variations in the durability of different sub-machines. These findings suggest that maintenance strategies should be tailored to the specific needs of plants, processes, product lines, and machines, as some groups are clearly more robust, while others require more frequent interventions.

In [ ]:
# Convert dates to datetime format
data_swire_copy['EXECUTION_FINISH_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_FINISH_DATE'], errors='coerce')
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')
data_swire_copy = data_swire_copy.dropna(subset=['EXECUTION_FINISH_DATE', 'EXECUTION_START_DATE'])

# Calculate duration from EXECUTION_FINISH_DATE to the next unplanned maintenance
data_swire_copy['duration_new'] = (data_swire_copy['EXECUTION_FINISH_DATE'].max() - data_swire_copy['EXECUTION_FINISH_DATE']).dt.days
data_swire_copy['event_new'] = 1  # Assuming all rows represent events

# Initialize Kaplan-Meier fitter
kmf = KaplanMeierFitter()

# Impact of Part Replacement on Time to Next Maintenance
print("\nHypothesis 2: Impact of Part Replacement on Time to Next Maintenance")
plt.figure(figsize=(10, 6))
for replacement in [True, False]:  # True if 'replace' is in ORDER_DESCRIPTION
    subset = data_swire_copy[data_swire_copy['ORDER_DESCRIPTION'].str.contains('replace', case=False) == replacement]
    label = "Replacement" if replacement else "No Replacement"

    # Fit Kaplan-Meier model for each group
    kmf.fit(durations=subset['duration_new'], event_observed=subset['event_new'], label=label)
    kmf.plot_survival_function(ci_show=False)

    # Print median survival time
    median_survival = kmf.median_survival_time_
    print(f"{label} - Median Survival Time: {median_survival} days")

    # Print survival probability at specific time points
    for days in [30, 90, 180, 365]:
        survival_prob = kmf.survival_function_at_times(days).values[0]
        print(f"{label} - Survival Probability at {days} days: {survival_prob:.4f}")

plt.title("Kaplan-Meier Survival Curves for Part Replacement vs. No Replacement")
plt.xlabel("Days Since Maintenance Completion")
plt.ylabel("Survival Probability")
plt.legend(title='Replacement Status')
plt.grid(True)
plt.show()

# Hypothesis 4: Differences by Hierarchical Level (Plant Level)
print("\nHypothesis 4: Differences in Survival by Plant Level")
plt.figure(figsize=(10, 6))
for plant in data_swire_copy['Plant'].dropna().unique():
    subset = data_swire_copy[data_swire_copy['Plant'] == plant]
    kmf.fit(durations=subset['duration_new'], event_observed=subset['event_new'], label=f"Plant {plant}")
    kmf.plot_survival_function(ci_show=False)

    # Print median survival time for each plant
    median_survival = kmf.median_survival_time_
    print(f"Plant {plant} - Median Survival Time: {median_survival} days")

    # Print survival probabilities at specific time points
    for days in [30, 90, 180, 365]:
        survival_prob = kmf.survival_function_at_times(days).values[0]
        print(f"Plant {plant} - Survival Probability at {days} days: {survival_prob:.4f}")

plt.title("Kaplan-Meier Survival Curves by Plant Level")
plt.xlabel("Days Since Maintenance Completion")
plt.ylabel("Survival Probability")
plt.legend(title='Plant Level')
plt.grid(True)
plt.show()

# Influence of Maintenance Season on Future Failures
print("\nHypothesis 5: Influence of Maintenance Season on Time to Failure")
# Define seasons based on finish date
data_swire_copy['season'] = data_swire_copy['EXECUTION_FINISH_DATE'].dt.month % 12 // 3 + 1  # 1=Winter, 2=Spring, 3=Summer, 4=Fall
season_labels = {1: "Winter", 2: "Spring", 3: "Summer", 4: "Fall"}
plt.figure(figsize=(10, 6))
for season in data_swire_copy['season'].unique():
    subset = data_swire_copy[data_swire_copy['season'] == season]
    label = season_labels[season]

    # Fit Kaplan-Meier model for each season
    kmf.fit(durations=subset['duration_new'], event_observed=subset['event_new'], label=label)
    kmf.plot_survival_function(ci_show=False)

    # Print median survival time for each season
    median_survival = kmf.median_survival_time_
    print(f"{label} Season - Median Survival Time: {median_survival} days")

    # Print survival probabilities at specific time points
    for days in [30, 90, 180, 365]:
        survival_prob = kmf.survival_function_at_times(days).values[0]
        print(f"{label} Season - Survival Probability at {days} days: {survival_prob:.4f}")

plt.title("Kaplan-Meier Survival Curves by Maintenance Season")
plt.xlabel("Days Since Maintenance Completion")
plt.ylabel("Survival Probability")
plt.legend(title='Season')
plt.grid(True)
plt.show()
Hypothesis 2: Impact of Part Replacement on Time to Next Maintenance
Replacement - Median Survival Time: 1222.0 days
Replacement - Survival Probability at 30 days: 0.9849
Replacement - Survival Probability at 90 days: 0.9584
Replacement - Survival Probability at 180 days: 0.9234
Replacement - Survival Probability at 365 days: 0.8416
No Replacement - Median Survival Time: 1197.0 days
No Replacement - Survival Probability at 30 days: 0.9843
No Replacement - Survival Probability at 90 days: 0.9542
No Replacement - Survival Probability at 180 days: 0.9121
No Replacement - Survival Probability at 365 days: 0.8262
No description has been provided for this image
Hypothesis 4: Differences in Survival by Plant Level
Plant G812 - Median Survival Time: 771.0 days
Plant G812 - Survival Probability at 30 days: 0.9812
Plant G812 - Survival Probability at 90 days: 0.9425
Plant G812 - Survival Probability at 180 days: 0.8853
Plant G812 - Survival Probability at 365 days: 0.7502
Plant G291 - Median Survival Time: 1387.0 days
Plant G291 - Survival Probability at 30 days: 0.9864
Plant G291 - Survival Probability at 90 days: 0.9567
Plant G291 - Survival Probability at 180 days: 0.9152
Plant G291 - Survival Probability at 365 days: 0.8421
Plant G816 - Median Survival Time: 1143.0 days
Plant G816 - Survival Probability at 30 days: 0.9810
Plant G816 - Survival Probability at 90 days: 0.9498
Plant G816 - Survival Probability at 180 days: 0.9149
Plant G816 - Survival Probability at 365 days: 0.8529
Plant G811 - Median Survival Time: 821.0 days
Plant G811 - Survival Probability at 30 days: 0.9730
Plant G811 - Survival Probability at 90 days: 0.9253
Plant G811 - Survival Probability at 180 days: 0.8577
Plant G811 - Survival Probability at 365 days: 0.7443
Plant G261 - Median Survival Time: 1242.0 days
Plant G261 - Survival Probability at 30 days: 0.9891
Plant G261 - Survival Probability at 90 days: 0.9702
Plant G261 - Survival Probability at 180 days: 0.9421
Plant G261 - Survival Probability at 365 days: 0.8667
Plant G221 - Median Survival Time: 1505.0 days
Plant G221 - Survival Probability at 30 days: 0.9892
Plant G221 - Survival Probability at 90 days: 0.9635
Plant G221 - Survival Probability at 180 days: 0.9200
Plant G221 - Survival Probability at 365 days: 0.8250
Plant G999 - Median Survival Time: 123.0 days
Plant G999 - Survival Probability at 30 days: 0.8869
Plant G999 - Survival Probability at 90 days: 0.6190
Plant G999 - Survival Probability at 180 days: 0.3214
Plant G999 - Survival Probability at 365 days: 0.0119
No description has been provided for this image
Hypothesis 5: Influence of Maintenance Season on Time to Failure
Spring Season - Median Survival Time: 1254.0 days
Spring Season - Survival Probability at 30 days: 1.0000
Spring Season - Survival Probability at 90 days: 1.0000
Spring Season - Survival Probability at 180 days: 0.8558
Spring Season - Survival Probability at 365 days: 0.8531
Fall Season - Median Survival Time: 1425.0 days
Fall Season - Survival Probability at 30 days: 1.0000
Fall Season - Survival Probability at 90 days: 1.0000
Fall Season - Survival Probability at 180 days: 1.0000
Fall Season - Survival Probability at 365 days: 0.8511
Winter Season - Median Survival Time: 1332.0 days
Winter Season - Survival Probability at 30 days: 1.0000
Winter Season - Survival Probability at 90 days: 1.0000
Winter Season - Survival Probability at 180 days: 1.0000
Winter Season - Survival Probability at 365 days: 0.8471
Summer Season - Median Survival Time: 1179.0 days
Summer Season - Survival Probability at 30 days: 0.9500
Summer Season - Survival Probability at 90 days: 0.8556
Summer Season - Survival Probability at 180 days: 0.8556
Summer Season - Survival Probability at 365 days: 0.8535
No description has been provided for this image

The Kaplan-Meier survival analysis explores the impact of various factors on time to failure after maintenance. In Hypothesis 2, examining the impact of part replacement, we observe a median survival time of 1222 days for cases with part replacement, compared to 1197 days without replacement. Survival probabilities are slightly higher in the replacement group at early time points, with a survival probability of 0.9849 at 30 days, decreasing to 0.8416 at 365 days. The no-replacement group also shows high early survival probabilities but slightly lower at each checkpoint, indicating that part replacement may have a marginal positive effect on prolonging survival times.

In Hypothesis 4, survival curves by plant level reveal considerable variation in median survival times. Plant G999 has the shortest median survival time at 123 days, with a rapid decline in survival probability to 0.0119 by day 365, indicating higher failure rates. In contrast, Plant G221 has the longest median survival time at 1505 days, maintaining a survival probability of 0.8250 at 365 days, suggesting better maintenance outcomes.

Hypothesis 5 investigates the effect of maintenance season on future failures. Fall and Winter seasons show the longest median survival times at 1425 and 1332 days, respectively, with consistently high survival probabilities over 365 days. Spring follows closely with a median of 1254 days, while Summer has the shortest median survival at 1179 days. This seasonal variation implies that maintenance done in warmer months (Summer) might correlate with slightly shorter time-to-failure, potentially due to environmental or operational factors. Overall, the analysis highlights that part replacement, plant-specific practices, and seasonality can impact equipment longevity post-maintenance.

In [ ]:
# Filter for unplanned maintenance data
df_unplanned = data_swire_copy[data_swire_copy['MAINTENANCE_ACTIVITY_TYPE'].str.contains('Unplanned', case=False, na=False)]

# Filter rows where shutdowns occurred based on 'ORDER_DESCRIPTION'
shutdown_rows = df_unplanned[df_unplanned['ORDER_DESCRIPTION'].str.contains('shutdown', case=False, na=False)]

# Convert 'EXECUTION_START_DATE' to datetime format for calculating durations
shutdown_rows['EXECUTION_START_DATE'] = pd.to_datetime(shutdown_rows['EXECUTION_START_DATE'], errors='coerce')

# Calculate the duration to the latest shutdown date
fixed_future_date = shutdown_rows['EXECUTION_START_DATE'].max()
shutdown_rows['duration_new'] = (fixed_future_date - shutdown_rows['EXECUTION_START_DATE']).dt.days

# Create an 'event_new' column indicating an unplanned maintenance occurrence (1 for event)
shutdown_rows['event_new'] = 1  # Since shutdown_rows only includes unplanned events

# Initialize the Kaplan-Meier fitter
kmf = KaplanMeierFitter()

# Dictionary to store median survival times for shutdown vs. non-shutdown events
median_survival_times = {}

# List of shutdown vs non-shutdown groups
shutdown_status = ["Shutdown", "No Shutdown"]

plt.figure(figsize=(12, 8))

# Loop over each status to fit Kaplan-Meier models
for status in shutdown_status:
    if status == "Shutdown":
        type_df = shutdown_rows
    else:
        type_df = df_unplanned[~df_unplanned['ORDER_DESCRIPTION'].str.contains('shutdown', case=False, na=False)]

    # Ensure EXECUTION_START_DATE is in datetime format
    type_df['EXECUTION_START_DATE'] = pd.to_datetime(type_df['EXECUTION_START_DATE'], errors='coerce')
    type_df = type_df.dropna(subset=['EXECUTION_START_DATE'])

    # Calculate duration based on the latest shutdown date
    type_df['duration_new'] = (fixed_future_date - type_df['EXECUTION_START_DATE']).dt.days
    type_df['event_new'] = 1  # Assuming all rows are events

    # Fit Kaplan-Meier model for each shutdown status
    kmf.fit(durations=type_df['duration_new'], event_observed=type_df['event_new'], label=status)

    # Store and print median survival time for each type
    median_survival_times[status] = kmf.median_survival_time_
    print(f"Median Survival Time for {status}: {kmf.median_survival_time_} days")

    # Plot the survival function for this group
    kmf.plot_survival_function(ci_show=False)

# Label the plot for better visualization
plt.title("Kaplan-Meier Survival Curves for Shutdown vs. Non-Shutdown Maintenance Events")
plt.xlabel("Days Since Last Maintenance")
plt.ylabel("Survival Probability")
plt.legend()
plt.grid(True)
plt.show()

# Output the calculated median survival times
print("\nMedian Survival Times (in days):")
for status, median_time in median_survival_times.items():
    print(f"{status}: {median_time} days")
<ipython-input-69-a1b1a05dc133>:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  shutdown_rows['EXECUTION_START_DATE'] = pd.to_datetime(shutdown_rows['EXECUTION_START_DATE'], errors='coerce')
<ipython-input-69-a1b1a05dc133>:18: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  shutdown_rows['duration_new'] = (fixed_future_date - shutdown_rows['EXECUTION_START_DATE']).dt.days
<ipython-input-69-a1b1a05dc133>:21: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  shutdown_rows['event_new'] = 1  # Since shutdown_rows only includes unplanned events
<ipython-input-69-a1b1a05dc133>:42: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  type_df['EXECUTION_START_DATE'] = pd.to_datetime(type_df['EXECUTION_START_DATE'], errors='coerce')
Median Survival Time for Shutdown: 421.0 days
<ipython-input-69-a1b1a05dc133>:42: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  type_df['EXECUTION_START_DATE'] = pd.to_datetime(type_df['EXECUTION_START_DATE'], errors='coerce')
Median Survival Time for No Shutdown: 1340.0 days
No description has been provided for this image
Median Survival Times (in days):
Shutdown: 421.0 days
No Shutdown: 1340.0 days

The Kaplan-Meier survival analysis for shutdown versus non-shutdown maintenance events reveals a significant difference in survival probabilities, emphasizing the impact of shutdown maintenance on time to next failure. The median survival time for equipment undergoing shutdown maintenance events is 421 days, indicating a shorter duration before another maintenance event is needed. This is in stark contrast to non-shutdown maintenance, where the median survival extends to 1340 days. The survival curve further illustrates this difference: equipment undergoing shutdown maintenance shows a steep decline in survival probability, with rapid failure rates early on, likely due to the severity or emergency nature of these events. Conversely, the non-shutdown maintenance curve gradually declines, suggesting that regular or less severe maintenance interventions effectively extend equipment longevity.

In summary, the shorter median survival time and steep decline in the shutdown maintenance survival curve suggest that shutdown events are associated with higher failure rates and shorter intervals between maintenance. This insight can inform maintenance planning by highlighting the need for preventive strategies or further investigation into the causes of shutdown events to reduce equipment downtime and maintenance frequency.

In [ ]:
# Convert relevant date columns to datetime format
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')
data_swire_copy['EXECUTION_FINISH_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_FINISH_DATE'], errors='coerce')
data_swire_copy['EQUIP_VALID_FROM'] = pd.to_datetime(data_swire_copy['EQUIP_VALID_FROM'], errors='coerce')
data_swire_copy['EQUIP_VALID_TO'] = pd.to_datetime(data_swire_copy['EQUIP_VALID_TO'], errors='coerce')
data_swire_copy.dropna(subset=['EXECUTION_START_DATE', 'EXECUTION_FINISH_DATE'], inplace=True)

# Calculate duration from EXECUTION_FINISH_DATE to a future date or next maintenance event
data_swire_copy['duration_new'] = (data_swire_copy['EXECUTION_FINISH_DATE'].max() - data_swire_copy['EXECUTION_FINISH_DATE']).dt.days
data_swire_copy['event_new'] = 1  # Assuming all rows represent events (maintenance was completed)

# Initialize Kaplan-Meier fitter
kmf = KaplanMeierFitter()

# Age of Equipment and Frequency of Breakdown
print("\nHypothesis 1: Age of Equipment and Frequency of Breakdown")
data_swire_copy['equipment_age'] = (data_swire_copy['EXECUTION_FINISH_DATE'] - data_swire_copy['EQUIP_VALID_FROM']).dt.days
age_categories = pd.qcut(data_swire_copy['equipment_age'], q=4)

plt.figure(figsize=(10, 6))
for age_group in age_categories.cat.categories:  # Loop over the interval categories
    subset = data_swire_copy[(data_swire_copy['equipment_age'] >= age_group.left) & (data_swire_copy['equipment_age'] <= age_group.right)]
    kmf.fit(durations=subset['duration_new'], event_observed=subset['event_new'], label=f"Age {age_group}")
    kmf.plot_survival_function(ci_show=False)

plt.title("Survival Curves by Equipment Age")
plt.xlabel("Days Since Last Maintenance")
plt.ylabel("Survival Probability")
plt.legend(title='Equipment Age Group')
plt.grid(True)
plt.show()

# Impact of Planned vs. Unplanned Maintenance on Downtime
print("\nHypothesis 2: Planned vs. Unplanned Maintenance")
for maintenance_type in ['Preventive', 'Corrective']:
    subset = data_swire_copy[data_swire_copy['MAINTENANCE_TYPE_DESCRIPTION'].str.contains(maintenance_type, case=False, na=False)]
    kmf.fit(durations=subset['duration_new'], event_observed=subset['event_new'], label=f"{maintenance_type} Maintenance")
    kmf.plot_survival_function(ci_show=False)

plt.title("Survival Curves by Maintenance Type")
plt.xlabel("Days Since Last Maintenance")
plt.ylabel("Survival Probability")
plt.legend(title='Maintenance Type')
plt.grid(True)
plt.show()

# Hypothesis 3: Seasonal Influence on Maintenance Needs
print("\nHypothesis 3: Influence of Season on Maintenance")
data_swire_copy['season'] = data_swire_copy['EXECUTION_FINISH_DATE'].dt.month % 12 // 3 + 1  # 1=Winter, 2=Spring, etc.
season_labels = {1: "Winter", 2: "Spring", 3: "Summer", 4: "Fall"}
plt.figure(figsize=(10, 6))
for season in data_swire_copy['season'].unique():
    subset = data_swire_copy[data_swire_copy['season'] == season]
    kmf.fit(durations=subset['duration_new'], event_observed=subset['event_new'], label=season_labels[season])
    kmf.plot_survival_function(ci_show=False)

plt.title("Survival Curves by Season")
plt.xlabel("Days Since Maintenance Completion")
plt.ylabel("Survival Probability")
plt.legend(title='Season')
plt.grid(True)
plt.show()

# Type of Maintenance and Part Replacement Frequency
print("\nHypothesis 4: Type of Maintenance and Part Replacement")
for replacement in [True, False]:  # True if 'replace' is in ORDER_DESCRIPTION
    subset = data_swire_copy[data_swire_copy['ORDER_DESCRIPTION'].str.contains('replace', case=False) == replacement]
    label = "Replacement" if replacement else "No Replacement"
    kmf.fit(durations=subset['duration_new'], event_observed=subset['event_new'], label=label)
    kmf.plot_survival_function(ci_show=False)

plt.title("Survival Curves for Part Replacement vs. No Replacement")
plt.xlabel("Days Since Maintenance Completion")
plt.ylabel("Survival Probability")
plt.legend(title='Replacement Status')
plt.grid(True)
plt.show()

# Hypothesis 5: Downtime and Opportunity Cost
print("\nHypothesis 5: Downtime and Opportunity Cost")
data_swire_copy['downtime_category'] = pd.qcut(data_swire_copy['ACTUAL_WORK_IN_MINUTES'], q=3, labels=["Low", "Medium", "High"])
for downtime_category in data_swire_copy['downtime_category'].unique():
    subset = data_swire_copy[data_swire_copy['downtime_category'] == downtime_category]
    kmf.fit(durations=subset['duration_new'], event_observed=subset['event_new'], label=f"Downtime: {downtime_category}")
    kmf.plot_survival_function(ci_show=False)

plt.title("Survival Curves by Downtime Category")
plt.xlabel("Days Since Last Maintenance")
plt.ylabel("Survival Probability")
plt.legend(title='Downtime Category')
plt.grid(True)
plt.show()

# Functional Location and Breakdown Frequency
print("\nHypothesis 6: Functional Location and Breakdown Frequency")
plt.figure(figsize=(10, 6))
for location in data_swire_copy['FUNCTIONAL_LOC'].dropna().unique()[:5]:  # Limit to first 5 locations for clarity
    subset = data_swire_copy[data_swire_copy['FUNCTIONAL_LOC'] == location]
    kmf.fit(durations=subset['duration_new'], event_observed=subset['event_new'], label=f"Location: {location}")
    kmf.plot_survival_function(ci_show=False)

plt.title("Survival Curves by Functional Location")
plt.xlabel("Days Since Last Maintenance")
plt.ylabel("Survival Probability")
plt.legend(title='Functional Location')
plt.grid(True)
plt.show()

# Part Replacement and Time to Next Failure
print("\nHypothesis 7: Part Replacement and Time to Next Failure")
for replacement in [True, False]:  # True if 'replace' is in ORDER_DESCRIPTION
    subset = data_swire_copy[data_swire_copy['ORDER_DESCRIPTION'].str.contains('replace', case=False) == replacement]
    label = "Replacement" if replacement else "No Replacement"
    kmf.fit(durations=subset['duration_new'], event_observed=subset['event_new'], label=label)
    kmf.plot_survival_function(ci_show=False)

plt.title("Survival Curves for Part Replacement Impact on Time to Next Failure")
plt.xlabel("Days Since Maintenance Completion")
plt.ylabel("Survival Probability")
plt.legend(title='Replacement Status')
plt.grid(True)
plt.show()
Hypothesis 1: Age of Equipment and Frequency of Breakdown
No description has been provided for this image
Hypothesis 2: Planned vs. Unplanned Maintenance
No description has been provided for this image
Hypothesis 3: Influence of Season on Maintenance
No description has been provided for this image
Hypothesis 4: Type of Maintenance and Part Replacement
No description has been provided for this image
Hypothesis 5: Downtime and Opportunity Cost
No description has been provided for this image
Hypothesis 6: Functional Location and Breakdown Frequency
No description has been provided for this image
Hypothesis 7: Part Replacement and Time to Next Failure
No description has been provided for this image

The Kaplan-Meier analyses reveal that equipment age and functional location significantly impact survival times, with older equipment and certain locations exhibiting shorter times to failure, indicating higher maintenance needs. Planned maintenance (preventive) shows a slight advantage over unplanned (corrective) maintenance, suggesting potential benefits in extending equipment life. However, part replacement alone does not significantly extend survival time, as both replaced and non-replaced parts show similar failure patterns. Seasonal variations and downtime duration categories (high, medium, low) have minimal influence on survival probabilities, implying that these factors do not substantially affect maintenance intervals. Overall, age and location are primary factors influencing equipment longevity, while routine interventions like part replacement and seasonal timing have limited effect.

11.6 Kaplan-Meier Analysis for Functional Nodes ¶

In [ ]:
# Functional Node Impact on Maintenance Failure Frequency and Time to Next Failure

# Convert relevant date columns to datetime format
data_swire_copy['EXECUTION_FINISH_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_FINISH_DATE'], errors='coerce')
data_swire_copy = data_swire_copy.dropna(subset=['EXECUTION_FINISH_DATE'])

# Calculate duration from EXECUTION_FINISH_DATE to the end of observation period
data_swire_copy['duration_new'] = (data_swire_copy['EXECUTION_FINISH_DATE'].max() - data_swire_copy['EXECUTION_FINISH_DATE']).dt.days
data_swire_copy['event_new'] = 1  # Assuming all rows represent events (failure occurred)

# List of functional nodes to analyze
functional_nodes = [
    'FUNCTIONAL_AREA_NODE_1_MODIFIED', 'FUNCTIONAL_AREA_NODE_2_MODIFIED',
    'FUNCTIONAL_AREA_NODE_3_MODIFIED', 'FUNCTIONAL_AREA_NODE_4_MODIFIED',
    'FUNCTIONAL_AREA_NODE_5_MODIFIED'
]

# Initialize Kaplan-Meier fitter
kmf = KaplanMeierFitter()

# Loop through each functional node to calculate top 10 failure frequencies and plot
for node in functional_nodes:
    # Calculate failure frequency for each unique value in the current functional node
    failure_counts = data_swire_copy[node].value_counts().nlargest(10)

    print(f"\nTop 10 Failure Frequencies for {node}:")
    print(failure_counts)

    # Plot Kaplan-Meier survival curves for each of the top 10 values in the current node
    plt.figure(figsize=(10, 6))
    for value in failure_counts.index:
        subset = data_swire_copy[data_swire_copy[node] == value]

        # Fit Kaplan-Meier model for each value
        kmf.fit(durations=subset['duration_new'], event_observed=subset['event_new'], label=f"{node}: {value}")

        # Print the median survival time for each node value
        median_survival_time = kmf.median_survival_time_
        print(f"Median Survival Time for {node} - {value}: {median_survival_time} days")

        # Plot the survival function
        kmf.plot_survival_function(ci_show=False)

    # Configure plot for better visualization
    plt.title(f"Kaplan-Meier Survival Curves for Top 10 Failure Frequencies in {node}")
    plt.xlabel("Days Since Last Maintenance")
    plt.ylabel("Survival Probability")
    plt.legend(title=node)
    plt.grid(True)
    plt.show()
Top 10 Failure Frequencies for FUNCTIONAL_AREA_NODE_1_MODIFIED:
FUNCTIONAL_AREA_NODE_1_MODIFIED
COTA PRODUCTION           70187
SILVERSTONE PRODUCTION    62625
PRODUCTION                60312
MONZA PRODUCTION          57458
SUZUKA PRODUCTION         36713
COOLER SERVICE             2699
FLEET                       123
Name: count, dtype: int64
Median Survival Time for FUNCTIONAL_AREA_NODE_1_MODIFIED - COTA PRODUCTION: 1143.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_1_MODIFIED - SILVERSTONE PRODUCTION: 1242.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_1_MODIFIED - PRODUCTION: 767.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_1_MODIFIED - MONZA PRODUCTION: 1386.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_1_MODIFIED - SUZUKA PRODUCTION: 1515.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_1_MODIFIED - COOLER SERVICE: 1418.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_1_MODIFIED - FLEET: 1376.0 days
No description has been provided for this image
Top 10 Failure Frequencies for FUNCTIONAL_AREA_NODE_2_MODIFIED:
FUNCTIONAL_AREA_NODE_2_MODIFIED
CAN LINE                       34208
BTL_PET_LINE                   32789
LINES 1,2,9 - BOTTLE LINES     29729
BOTTLE LINE                    27543
LINES 3,4,10 - CAN LINES       23153
L1, L2, L6 PET BOTTLE LINES    22460
G812 PRD FILLER_ROTARY_CAN     17390
ROTARY_CAN_LINE                14016
G812 PRD FILLER_ROT_BTL_PET    11200
L4 - FILLER_ROTARY_CAN         10799
Name: count, dtype: int64
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - CAN LINE: 1095.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - BTL_PET_LINE: 1359.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - LINES 1,2,9 - BOTTLE LINES: 1289.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - BOTTLE LINE: 1186.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - LINES 3,4,10 - CAN LINES: 1175.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - L1, L2, L6 PET BOTTLE LINES: 1494.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - G812 PRD FILLER_ROTARY_CAN: 936.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - ROTARY_CAN_LINE: 1566.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - G812 PRD FILLER_ROT_BTL_PET: 1552.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - L4 - FILLER_ROTARY_CAN: 1576.0 days
No description has been provided for this image
Top 10 Failure Frequencies for FUNCTIONAL_AREA_NODE_3_MODIFIED:
FUNCTIONAL_AREA_NODE_3_MODIFIED
LINE #4-FILLER_ROTARY_CAN             35834
LINE #2 - FILLER_ROT_BTL_PET          23027
LINE #1 - FILLER_ROT_BTL_PET          18290
LINE #1-CAN_LINE                      17844
LINE #2 - BOTTLE_LINE_PET             17774
LINE #3-CAN_LINE                      16364
G812 PRD FILLER_ROTARY_CAN LINE 1     12984
LINE #3-FILLER_ROTARY_CAN             12013
G812 PRD FILLER_ROT_BTL_PET LINE 2    11200
LINE #2 - ROT_BTL_PET                 10674
Name: count, dtype: int64
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #4-FILLER_ROTARY_CAN: 1396.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #2 - FILLER_ROT_BTL_PET: 1379.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #1 - FILLER_ROT_BTL_PET: 1587.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #1-CAN_LINE: 1061.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #2 - BOTTLE_LINE_PET: 1191.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #3-CAN_LINE: 1137.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - G812 PRD FILLER_ROTARY_CAN LINE 1: 1554.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #3-FILLER_ROTARY_CAN: 1237.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - G812 PRD FILLER_ROT_BTL_PET LINE 2: 1552.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #2 - ROT_BTL_PET: 1298.0 days
No description has been provided for this image
Top 10 Failure Frequencies for FUNCTIONAL_AREA_NODE_4_MODIFIED:
FUNCTIONAL_AREA_NODE_4_MODIFIED
FILLER               46322
PACKER               35230
CONVEYOR             34765
LABELER              18945
PALLETIZER           16239
DEPALLETIZER         10850
DETECTOR, PRODUCT     9404
SEAMER, ROTARY        8510
CAPPER                8350
WARMER_COOLER         7686
Name: count, dtype: int64
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - FILLER: 1177.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - PACKER: 1193.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - CONVEYOR: 1139.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - LABELER: 1524.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - PALLETIZER: 1242.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - DEPALLETIZER: 1114.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - DETECTOR, PRODUCT: 1576.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - SEAMER, ROTARY: 1204.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - CAPPER: 1621.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - WARMER_COOLER: 1606.0 days
No description has been provided for this image
Top 10 Failure Frequencies for FUNCTIONAL_AREA_NODE_5_MODIFIED:
FUNCTIONAL_AREA_NODE_5_MODIFIED
FULL CAN TO ACCUMULATION TABLE           251
PACKER 1, MODEL DUODOZEN 1250SX          210
FULL CASE CONVEYORS                      158
EMPTY CASE CONVEYORS                     136
FULL CASE SPIRAL TO PALLETIZER           125
FB CONVEYORS TO DOUGLAS/HI-CONE SPLIT    114
EMPTY CAN                                110
COMPRESSOR                               110
PACKER 2, MODEL FENIX 185V               108
FULL CASE CONVEYOR                       101
Name: count, dtype: int64
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - FULL CAN TO ACCUMULATION TABLE: 842.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - PACKER 1, MODEL DUODOZEN 1250SX: 176.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - FULL CASE CONVEYORS: 745.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - EMPTY CASE CONVEYORS: 711.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - FULL CASE SPIRAL TO PALLETIZER: 478.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - FB CONVEYORS TO DOUGLAS/HI-CONE SPLIT: 1050.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - EMPTY CAN: 938.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - COMPRESSOR: 555.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - PACKER 2, MODEL FENIX 185V: 167.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - FULL CASE CONVEYOR: 402.0 days
No description has been provided for this image

The analysis of maintenance failures across different functional nodes highlights key areas within the system that experience frequent breakdowns and have varying resilience levels, as measured by median survival times. In NODE 1, "COTA PRODUCTION" and "SILVERSTONE PRODUCTION" have the highest failure frequencies, with 70,187 and 62,625 failures respectively, and median survival times of 1,143 and 1,242 days. These values suggest that while these production areas are prone to frequent issues, they still sustain operations for substantial periods before requiring intervention. NODE 2 focuses on specific production lines, where "CAN LINE" and "BTL_PET_LINE" show high frequencies (34,208 and 32,789 failures) and median survival times of 1,095 and 1,359 days, respectively, indicating that these bottle and can lines are critical points in the system with substantial operational capacity before failure.

In NODE 3, maintenance records highlight individual lines, such as "LINE #4-FILLER_ROTARY_CAN" with 35,834 failures and a median survival time of 1,396 days, and "LINE #2 - FILLER_ROT_BTL_PET" with 23,027 failures and a 1,379-day median survival. These findings point to high use and wear on rotary fillers and bottle lines, suggesting they are integral but heavily strained sections that might benefit from targeted maintenance. NODE 4 reveals that components like "FILLER" and "PACKER" have the highest failure rates, with median survival times of 1,177 and 1,193 days, respectively, showing these parts sustain frequent usage and require regular upkeep to maintain functionality. Lastly, in NODE 5, smaller, more specific components like "FULL CAN TO ACCUMULATION TABLE" (251 failures) and "PACKER 1, MODEL DUODOZEN 1250SX" (210 failures) have lower median survival times, particularly "PACKER 1" with only 176 days, suggesting these parts might be particularly vulnerable or critical in the production process. This node-by-node breakdown highlights both heavily utilized and vulnerable areas, allowing for informed prioritization in maintenance planning to enhance overall reliability and minimize unplanned downtimes.

In [ ]:
# Impact of Functional Node on Maintenance Duration and Frequency of Failures

# Convert 'ACTUAL_WORK_IN_MINUTES' to hours for better interpretation
data_swire_copy['maintenance_duration_hours'] = data_swire_copy['ACTUAL_WORK_IN_MINUTES'] / 60

# Initialize Kaplan-Meier fitter
kmf = KaplanMeierFitter()

# List of functional nodes to analyze
functional_nodes = [
    'FUNCTIONAL_AREA_NODE_1_MODIFIED', 'FUNCTIONAL_AREA_NODE_2_MODIFIED',
    'FUNCTIONAL_AREA_NODE_3_MODIFIED', 'FUNCTIONAL_AREA_NODE_4_MODIFIED',
    'FUNCTIONAL_AREA_NODE_5_MODIFIED'
]

# Analyze maintenance duration by functional node
for node in functional_nodes:
    # Calculate top 10 most frequent values in the current node
    top_values = data_swire_copy[node].value_counts().nlargest(10)

    print(f"\nKaplan-Meier Survival Analysis of Maintenance Duration for {node}")
    plt.figure(figsize=(10, 6))

    # Loop through the top 10 values for the current functional node
    for value in top_values.index:
        subset = data_swire_copy[data_swire_copy[node] == value]

        # Fit Kaplan-Meier model with maintenance duration as 'duration'
        kmf.fit(durations=subset['maintenance_duration_hours'], event_observed=subset['event_new'], label=f"{node}: {value}")

        # Print the median survival time for each node value
        median_survival_time = kmf.median_survival_time_
        print(f"Median Survival Time for {node} - {value}: {median_survival_time} hours")

        # Plot the survival function
        kmf.plot_survival_function(ci_show=False)

    plt.title(f"Kaplan-Meier Survival Curves for Maintenance Duration by {node}")
    plt.xlabel("Hours of Maintenance Work")
    plt.ylabel("Survival Probability")
    plt.legend(title=node)
    plt.grid(True)
    plt.show()
Kaplan-Meier Survival Analysis of Maintenance Duration for FUNCTIONAL_AREA_NODE_1_MODIFIED
Median Survival Time for FUNCTIONAL_AREA_NODE_1_MODIFIED - COTA PRODUCTION: 1.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_1_MODIFIED - SILVERSTONE PRODUCTION: 1.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_1_MODIFIED - PRODUCTION: 1.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_1_MODIFIED - MONZA PRODUCTION: 0.5 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_1_MODIFIED - SUZUKA PRODUCTION: 1.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_1_MODIFIED - COOLER SERVICE: 0.5 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_1_MODIFIED - FLEET: 1.5 hours
No description has been provided for this image
Kaplan-Meier Survival Analysis of Maintenance Duration for FUNCTIONAL_AREA_NODE_2_MODIFIED
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - CAN LINE: 1.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - BTL_PET_LINE: 0.5 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - LINES 1,2,9 - BOTTLE LINES: 1.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - BOTTLE LINE: 1.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - LINES 3,4,10 - CAN LINES: 0.9 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - L1, L2, L6 PET BOTTLE LINES: 1.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - G812 PRD FILLER_ROTARY_CAN: 0.5 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - ROTARY_CAN_LINE: 0.3 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - G812 PRD FILLER_ROT_BTL_PET: 0.5 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - L4 - FILLER_ROTARY_CAN: 1.0 hours
No description has been provided for this image
Kaplan-Meier Survival Analysis of Maintenance Duration for FUNCTIONAL_AREA_NODE_3_MODIFIED
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #4-FILLER_ROTARY_CAN: 0.5 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #2 - FILLER_ROT_BTL_PET: 0.7 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #1 - FILLER_ROT_BTL_PET: 0.5 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #1-CAN_LINE: 1.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #2 - BOTTLE_LINE_PET: 1.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #3-CAN_LINE: 1.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - G812 PRD FILLER_ROTARY_CAN LINE 1: 0.5 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #3-FILLER_ROTARY_CAN: 1.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - G812 PRD FILLER_ROT_BTL_PET LINE 2: 0.5 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #2 - ROT_BTL_PET: 1.0 hours
No description has been provided for this image
Kaplan-Meier Survival Analysis of Maintenance Duration for FUNCTIONAL_AREA_NODE_4_MODIFIED
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - FILLER: 1.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - PACKER: 1.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - CONVEYOR: 1.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - LABELER: 0.75 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - PALLETIZER: 1.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - DEPALLETIZER: 1.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - DETECTOR, PRODUCT: 0.3 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - SEAMER, ROTARY: 0.9 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - CAPPER: 0.5 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - WARMER_COOLER: 0.5 hours
No description has been provided for this image
Kaplan-Meier Survival Analysis of Maintenance Duration for FUNCTIONAL_AREA_NODE_5_MODIFIED
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - FULL CAN TO ACCUMULATION TABLE: 1.5 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - PACKER 1, MODEL DUODOZEN 1250SX: 1.5 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - FULL CASE CONVEYORS: 1.5 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - EMPTY CASE CONVEYORS: 1.25 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - FULL CASE SPIRAL TO PALLETIZER: 2.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - FB CONVEYORS TO DOUGLAS/HI-CONE SPLIT: 2.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - EMPTY CAN: 1.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - COMPRESSOR: 2.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - PACKER 2, MODEL FENIX 185V: 3.0 hours
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - FULL CASE CONVEYOR: 1.0 hours
No description has been provided for this image

The Kaplan-Meier survival analysis of maintenance duration across different functional nodes reveals notable differences in time required for maintenance among key areas within the system. In NODE 1, areas like "COTA PRODUCTION," "SILVERSTONE PRODUCTION," and "PRODUCTION" have a median maintenance duration of 1.0 hour, indicating that these areas typically require relatively brief maintenance interventions, though they frequently require attention. NODE 2 shows similar short maintenance durations, with "CAN LINE" and "BTL_PET_LINE" requiring a median of 1.0 and 0.5 hours, respectively, suggesting these lines are critical but manageable in terms of downtime when maintenance is needed. In NODE 3, components such as "LINE #4-FILLER_ROTARY_CAN" and "LINE #2 - FILLER_ROT_BTL_PET" also demonstrate short maintenance durations, around 0.5 to 1.0 hours, indicating a high turnover in maintenance for these lines without extensive downtime.

NODE 4 includes components like "FILLER" and "PACKER," both showing median maintenance times of 1.0 hour, while "DETECTOR, PRODUCT" and "CAPPER" have lower durations, at 0.3 and 0.5 hours respectively. This suggests that, while essential, these components require minimal time for maintenance, likely allowing for quick resumption of operations. NODE 5 presents the longest median maintenance durations, with "PACKER 2, MODEL FENIX 185V" requiring up to 3.0 hours and other areas, such as "FULL CASE SPIRAL TO PALLETIZER" and "COMPRESSOR," taking 2.0 hours. These longer durations indicate these are likely bottlenecks in the maintenance process where more resources or time are needed to resolve issues. Overall, the findings provide insight into maintenance efficiency, with shorter durations seen in frequently maintained production lines and longer times in more specialized or complex nodes, suggesting targeted strategies for optimizing maintenance schedules across these functional nodes.

In [ ]:
# Influence of Part Replacement Frequency by Functional Node on Time to Next Failure

# Convert relevant date columns to datetime format
data_swire_copy['EXECUTION_FINISH_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_FINISH_DATE'], errors='coerce')
data_swire_copy = data_swire_copy.dropna(subset=['EXECUTION_FINISH_DATE'])

# Calculate duration from EXECUTION_FINISH_DATE to the end of observation period
data_swire_copy['duration_new'] = (data_swire_copy['EXECUTION_FINISH_DATE'].max() - data_swire_copy['EXECUTION_FINISH_DATE']).dt.days
data_swire_copy['event_new'] = 1  # Assuming all rows represent events (failure occurred)

# Filter rows where parts have been replaced based on 'ORDER_DESCRIPTION'
replace_rows = data_swire_copy[data_swire_copy['ORDER_DESCRIPTION'].str.contains('replace', case=False, na=False)]

# Initialize Kaplan-Meier fitter
kmf = KaplanMeierFitter()

# List of functional nodes to analyze
functional_nodes = [
    'FUNCTIONAL_AREA_NODE_1_MODIFIED', 'FUNCTIONAL_AREA_NODE_2_MODIFIED',
    'FUNCTIONAL_AREA_NODE_3_MODIFIED', 'FUNCTIONAL_AREA_NODE_4_MODIFIED',
    'FUNCTIONAL_AREA_NODE_5_MODIFIED'
]

# Analyze part replacement frequency by functional node
for node in functional_nodes:
    # Calculate top 10 most frequent values in the current node where replacements were performed
    top_values = replace_rows[node].value_counts().nlargest(10)

    print(f"\nKaplan-Meier Survival Analysis for Part Replacement Frequency by {node}")
    plt.figure(figsize=(10, 6))

    # Loop through the top 10 values for the current functional node
    for value in top_values.index:
        subset = replace_rows[replace_rows[node] == value]

        # Fit Kaplan-Meier model with duration from EXECUTION_FINISH_DATE to the observation end as the 'duration'
        kmf.fit(durations=subset['duration_new'], event_observed=subset['event_new'], label=f"{node}: {value}")

        # Print the median survival time for each node value
        median_survival_time = kmf.median_survival_time_
        print(f"Median Survival Time for {node} - {value}: {median_survival_time} days")

        # Plot the survival function
        kmf.plot_survival_function(ci_show=False)

    # Configure plot for better visualization
    plt.title(f"Kaplan-Meier Survival Curves for Part Replacement Frequency by {node}")
    plt.xlabel("Days Since Last Maintenance")
    plt.ylabel("Survival Probability")
    plt.legend(title=node)
    plt.grid(True)
    plt.show()
Kaplan-Meier Survival Analysis for Part Replacement Frequency by FUNCTIONAL_AREA_NODE_1_MODIFIED
Median Survival Time for FUNCTIONAL_AREA_NODE_1_MODIFIED - SILVERSTONE PRODUCTION: 1276.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_1_MODIFIED - COTA PRODUCTION: 1314.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_1_MODIFIED - PRODUCTION: 1022.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_1_MODIFIED - SUZUKA PRODUCTION: 1329.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_1_MODIFIED - MONZA PRODUCTION: 1064.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_1_MODIFIED - FLEET: 1161.0 days
No description has been provided for this image
Kaplan-Meier Survival Analysis for Part Replacement Frequency by FUNCTIONAL_AREA_NODE_2_MODIFIED
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - LINES 1,2,9 - BOTTLE LINES: 1209.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - CAN LINE: 1229.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - BOTTLE LINE: 1400.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - LINES 3,4,10 - CAN LINES: 1229.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - LINE 2,3 PET BOTTLE LINES: 1067.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - L1, L2, L6 PET BOTTLE LINES: 1284.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - BTL_PET_LINE: 1162.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - G812 PRD FILLER_ROTARY_CAN: 1011.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - LINE 1 CAN: 962.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_2_MODIFIED - OFFICES: 1575.0 days
No description has been provided for this image
Kaplan-Meier Survival Analysis for Part Replacement Frequency by FUNCTIONAL_AREA_NODE_3_MODIFIED
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #4-FILLER_ROTARY_CAN: 1171.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #2 - FILLER_ROT_BTL_PET: 1184.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #2 - BOTTLE_LINE_PET: 1365.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #2 - ROT_BTL_PET: 1284.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #1-CAN_LINE: 1100.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #3-CAN_LINE: 1322.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #3-FILLER_ROTARY_CAN: 1300.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #1 - ROT_BTL_PET: 1286.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE #9 - ROT_BTL_PET: 1060.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_3_MODIFIED - LINE 2 LARGET PET: 1053.0 days
No description has been provided for this image
Kaplan-Meier Survival Analysis for Part Replacement Frequency by FUNCTIONAL_AREA_NODE_4_MODIFIED
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - FILLER: 1262.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - PACKER: 1169.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - CONVEYOR: 1123.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - LABELER: 1286.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - PALLETIZER: 1396.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - WRAPPER: 1169.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - RINSER: 1156.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - DEPALLETIZER: 1063.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - CAPPER: 1021.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_4_MODIFIED - WARMER_COOLER: 1265.0 days
No description has been provided for this image
Kaplan-Meier Survival Analysis for Part Replacement Frequency by FUNCTIONAL_AREA_NODE_5_MODIFIED
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - FULL CAN TO ACCUMULATION TABLE: 895.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - FULL CASE CONVEYORS: 715.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - EMPTY CASE CONVEYORS: 618.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - FULL BOTTLE CONVEYORS: 764.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - FB CONVEYORS TO DOUGLAS/HI-CONE SPLIT: 778.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - PACKER 1, MODEL DUODOZEN 1250SX: 254.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - EAST SYRUP TANKS: 796.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - FULL CASE SPIRAL TO PALLETIZER: 564.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - EMPTY CAN: 848.0 days
Median Survival Time for FUNCTIONAL_AREA_NODE_5_MODIFIED - *OLD SYSTEM MT CASE CONVEYORS*: 283.0 days
No description has been provided for this image

The Kaplan-Meier analysis reveals the impact of part replacement frequency across various functional nodes on the time to the next failure. In NODE 1, areas such as "SILVERSTONE PRODUCTION" and "COTA PRODUCTION" have median survival times of 1276 and 1314 days, respectively, indicating relatively extended durations before the next failure, potentially due to effective replacements in high-production areas. NODE 2 shows varied survival times, with "LINES 1,2,9 - BOTTLE LINES" and "BOTTLE LINE" experiencing 1209 and 1400 days before the next failure, respectively, suggesting that certain configurations benefit from a prolonged interval between failures when parts are replaced. In NODE 3, "LINE #2 - BOTTLE_LINE_PET" and "LINE #3-CAN_LINE" demonstrate high survival times of 1365 and 1322 days, indicating that specific bottling and canning lines respond well to replacements, extending their operational duration.

NODE 4 includes essential components like "PALLETIZER" and "LABELER," with survival times of 1396 and 1286 days, respectively, highlighting effective maintenance in these areas that significantly delays the next failure. This suggests that replacing parts in these functions contributes positively to prolonging operational efficiency. NODE 5, however, presents shorter survival times overall, with "PACKER 1, MODEL DUODOZEN 1250SX" and "OLD SYSTEM MT CASE CONVEYORS" exhibiting only 254 and 283 days, respectively, before failure. This node’s relatively low survival times imply that these areas may experience rapid deterioration or are less resilient to wear even with replacements. Overall, the findings illustrate that part replacement frequency plays a substantial role in influencing maintenance intervals, with some nodes showing extended periods before failures and others requiring more frequent interventions to maintain functionality.

In [ ]:
import pandas as pd
from lifelines import KaplanMeierFitter
from lifelines.utils import median_survival_times
import matplotlib.pyplot as plt
import numpy as np
import re

# Load and preprocess data
data_swire_copy = pd.read_csv("IWC_Work_Orders_Extract.csv")
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')
data_swire_copy['EQUIP_START_UP_DATE'] = pd.to_datetime(data_swire_copy['EQUIP_START_UP_DATE'], errors='coerce')


# Calculate machine age and survival time
data_swire_copy['machine_age'] = (data_swire_copy['EXECUTION_START_DATE'] - data_swire_copy['EQUIP_START_UP_DATE']).dt.days

# Define event: 1 if failure occurred, 0 otherwise
data_swire_copy['failure_event'] = 1  # Assume every row represents a failure for simplicity

# Correct Equipment Type Extraction
equipment_types = ['filler', 'packer', 'blender', 'blower', 'labeler']
data_swire_copy['Equipment_Type'] = data_swire_copy['EQUIPMENT_DESC'].str.extract(r'(filler|packer|blender|blower|labeler)', flags=re.IGNORECASE)

# Check extraction success
print("Equipment types found:", data_swire_copy['Equipment_Type'].unique())

# 1. Survival Analysis for Equipment Types
plt.figure(figsize=(12, 8))
kmf = KaplanMeierFitter()

median_survival_times_dict = {}
for equipment_type in equipment_types:
    type_data = data_swire_copy[data_swire_copy['Equipment_Type'].str.contains(equipment_type, case=False, na=False)]
    if not type_data.empty:
        kmf.fit(type_data['machine_age'], event_observed=type_data['failure_event'], label=equipment_type.capitalize())
        median_survival = kmf.median_survival_time_  # Correctly access the median survival time
        median_survival_times_dict[equipment_type] = median_survival
        kmf.plot_survival_function()

plt.title("Survival Curves by Equipment Type")
plt.xlabel("Machine Age (Days)")
plt.ylabel("Survival Probability")
plt.legend(title="Equipment Type")
plt.show()

# Print survival times for validation
print("Median survival times:", median_survival_times_dict)

# 2. Simulate the Impact of Planned Maintenance for Unplanned Failures
preventability_factor = 0.8  # Assume 80% of failures before median survival time are preventable

data_swire_copy['preventive_maintenance'] = data_swire_copy.apply(
    lambda row: 1 if (
        row['machine_age'] < median_survival_times_dict.get(
            row['Equipment_Type'], float('inf')
        ) and np.random.random() < preventability_factor
    ) else 0, axis=1
)

# Compare unplanned maintenance with and without the intervention
before_intervention_unplanned = data_swire_copy['failure_event'].sum()
after_intervention_unplanned = data_swire_copy.loc[data_swire_copy['preventive_maintenance'] == 0, 'failure_event'].sum()

print(f"Unplanned maintenance events before intervention: {before_intervention_unplanned}")
print(f"Unplanned maintenance events after intervention: {after_intervention_unplanned}")

# Calculate downtime reduction for unplanned maintenance
downtime_reduction_unplanned = (before_intervention_unplanned - after_intervention_unplanned) / before_intervention_unplanned * 100
print(f"Estimated reduction in unplanned maintenance events: {downtime_reduction_unplanned:.2f}%")

# 3. Simulate the Impact of Planned Maintenance
before_intervention_planned = data_swire_copy[data_swire_copy['MAINTENANCE_ACTIVITY_TYPE'] == 'Planned'].shape[0]

data_swire_copy['planned_maintenance_intervention'] = data_swire_copy.apply(
    lambda row: 1 if (
        row['machine_age'] < median_survival_times_dict.get(
            row['Equipment_Type'], float('inf')
        )
    ) else 0, axis=1
)

after_intervention_planned = data_swire_copy[data_swire_copy['planned_maintenance_intervention'] == 1].shape[0]

if before_intervention_planned == 0:
    planned_maintenance_improvement = "Not Applicable (No Baseline Planned Maintenance)"
else:
    planned_maintenance_improvement = (after_intervention_planned - before_intervention_planned) / before_intervention_planned * 100

print(f"Planned maintenance events before intervention: {before_intervention_planned}")
print(f"Planned maintenance events after intervention: {after_intervention_planned}")
print(f"Estimated improvement in planned maintenance: {planned_maintenance_improvement}")

# 4. Validation Checks

# Check total maintenance events
print("Total maintenance events:", data_swire_copy.shape[0])

# Verify planned and unplanned event counts
print("Planned maintenance events:", data_swire_copy[data_swire_copy['MAINTENANCE_ACTIVITY_TYPE'] == 'Planned'].shape[0])
print("Unplanned maintenance events:", data_swire_copy[data_swire_copy['MAINTENANCE_ACTIVITY_TYPE'] == 'Unplanned'].shape[0])

# Verify preventive maintenance flags
preventive_events = data_swire_copy[data_swire_copy['preventive_maintenance'] == 1]
print("Preventive maintenance events flagged:", preventive_events.shape[0])
print(preventive_events.head())

# Analyze remaining unplanned maintenance events
unplanned_after = data_swire_copy.loc[data_swire_copy['preventive_maintenance'] == 0]
print("Remaining unplanned events after intervention:", unplanned_after.shape[0])
print(unplanned_after.head())

# Validate planned maintenance increase
planned_events_before = data_swire_copy[data_swire_copy['MAINTENANCE_ACTIVITY_TYPE'] == 'Planned']
planned_events_after = data_swire_copy[data_swire_copy['planned_maintenance_intervention'] == 1]
print("Planned events before intervention:", planned_events_before.shape[0])
print("Planned events after intervention:", planned_events_after.shape[0])

# Spot-check random samples for validation
print(data_swire_copy[['Equipment_Type', 'machine_age', 'failure_event',
                       'preventive_maintenance', 'planned_maintenance_intervention']].sample(10))
Equipment types found: [nan 'FILLER' 'LABELER' 'BLOWER' 'PACKER' 'BLENDER']
No description has been provided for this image
Median survival times: {'filler': 1237.0, 'packer': 1407.0, 'blender': 1139.0, 'blower': 1617.0, 'labeler': 1209.0}
Unplanned maintenance events before intervention: 1048575
Unplanned maintenance events after intervention: 1005972
Estimated reduction in unplanned maintenance events: 4.06%
Planned maintenance events before intervention: 104458
Planned maintenance events after intervention: 53250
Estimated improvement in planned maintenance: -49.02257366597102
Total maintenance events: 1048575
Planned maintenance events: 104458
Unplanned maintenance events: 944117
Preventive maintenance events flagged: 42603
      ORDER_ID PLANT_ID PRODUCTION_LOCATION EXECUTION_START_DATE  \
0  705642457.0     G812                ROMA           2024-05-04   
1  704191697.0     G812                ROMA           2022-09-13   
2  704466547.0     G812                ROMA           2022-12-21   
3  703834477.0     G812                ROMA           2022-07-04   
5  704948720.0     G291               MONZA           2023-09-08   

  EXECUTION_FINISH_DATE ACTUAL_START_TIME ACTUAL_FINISH_TIME  \
0             5/12/2024           00:00.0            04:08.0   
1             9/13/2022           00:00.0            17:24.0   
2            12/21/2022           00:00.0            00:00.0   
3              7/4/2022           00:00.0            00:00.0   
5              9/8/2023           00:00.0            00:00.0   

   ACTUAL_WORK_IN_MINUTES MAINTENANCE_PLAN  MAINTENANCE_ITEM  ...  \
0                   390.0              NaN               NaN  ...   
1                   420.0              NaN               NaN  ...   
2                    30.0              NaN               NaN  ...   
3                    60.0              NaN               NaN  ...   
5                    18.0       G291VM1437           40534.0  ...   

              EQUIPMENT_DESC EQUIP_CAT_DESC EQUIP_START_UP_DATE  \
0  FULL CAN CONVEYOR (FC41B)       Machines          2020-04-22   
1  FULL CAN CONVEYOR (FC41B)       Machines          2020-04-22   
2  FULL CAN CONVEYOR (FC41B)       Machines          2020-04-22   
3  FULL CAN CONVEYOR (FC41B)       Machines          2020-04-22   
5          TANK_STL_STOR_AIR       Machines          2017-02-15   

  EQUIP_VALID_FROM EQUIP_VALID_TO machine_age failure_event Equipment_Type  \
0        2/10/2020     12/31/9999      1473.0             1            NaN   
1        2/10/2020     12/31/9999       874.0             1            NaN   
2        2/10/2020     12/31/9999       973.0             1            NaN   
3        2/10/2020     12/31/9999       803.0             1            NaN   
5       12/11/2023     12/31/9999      2396.0             1            NaN   

  preventive_maintenance  planned_maintenance_intervention  
0                      1                                 1  
1                      1                                 1  
2                      1                                 1  
3                      1                                 1  
5                      1                                 1  

[5 rows x 30 columns]
Remaining unplanned events after intervention: 1005972
       ORDER_ID PLANT_ID PRODUCTION_LOCATION EXECUTION_START_DATE  \
4   704661125.0     G291               MONZA           2023-03-15   
9   704448350.0     G291               MONZA           2022-12-13   
11  705845653.0     G291               MONZA           2024-07-30   
13  704583800.0     G291               MONZA           2023-02-09   
19  704660973.0     G291               MONZA           2023-03-15   

   EXECUTION_FINISH_DATE ACTUAL_START_TIME ACTUAL_FINISH_TIME  \
4              3/15/2023           00:00.0            00:00.0   
9             12/13/2022           00:00.0            00:00.0   
11             7/30/2024           00:00.0            00:00.0   
13              2/9/2023           00:00.0            00:00.0   
19             3/15/2023           00:00.0            00:00.0   

    ACTUAL_WORK_IN_MINUTES MAINTENANCE_PLAN  MAINTENANCE_ITEM  ...  \
4                    120.0              NaN               NaN  ...   
9                     30.0              NaN               NaN  ...   
11                   240.0              NaN               NaN  ...   
13                    30.0              NaN               NaN  ...   
19                    30.0              NaN               NaN  ...   

                 EQUIPMENT_DESC EQUIP_CAT_DESC EQUIP_START_UP_DATE  \
4             TANK_STL_STOR_AIR       Machines          2017-02-15   
9   PUMP_RECIP_HIGH_PRESSURE_NF       Machines          2017-02-15   
11  PUMP_RECIP_HIGH_PRESSURE_NF       Machines          2017-02-15   
13  PUMP_RECIP_HIGH_PRESSURE_NF       Machines          2017-02-15   
19  PUMP_RECIP_HIGH_PRESSURE_NF       Machines          2017-02-15   

   EQUIP_VALID_FROM EQUIP_VALID_TO machine_age failure_event Equipment_Type  \
4        12/11/2023     12/31/9999      2219.0             1            NaN   
9        12/11/2023     12/31/9999      2127.0             1            NaN   
11       12/11/2023     12/31/9999      2722.0             1            NaN   
13       12/11/2023     12/31/9999      2185.0             1            NaN   
19       12/11/2023     12/31/9999      2219.0             1            NaN   

   preventive_maintenance  planned_maintenance_intervention  
4                       0                                 1  
9                       0                                 1  
11                      0                                 1  
13                      0                                 1  
19                      0                                 1  

[5 rows x 30 columns]
Planned events before intervention: 104458
Planned events after intervention: 53250
       Equipment_Type  machine_age  failure_event  preventive_maintenance  \
577200            NaN          NaN              1                       0   
749214            NaN          NaN              1                       0   
919038            NaN          NaN              1                       0   
94933             NaN       1564.0              1                       1   
827505            NaN          NaN              1                       0   
870622            NaN          NaN              1                       0   
538812            NaN          NaN              1                       0   
592517            NaN          NaN              1                       0   
933526         FILLER       1701.0              1                       1   
850844            NaN          NaN              1                       0   

        planned_maintenance_intervention  
577200                                 0  
749214                                 0  
919038                                 0  
94933                                  1  
827505                                 0  
870622                                 0  
538812                                 0  
592517                                 0  
933526                                 1  
850844                                 0  
In [ ]:
# Sort data by equipment and date
data_swire_copy = data_swire_copy.sort_values(by=['EQUIPMENT_ID', 'EXECUTION_START_DATE'])

# Define equipment categories
categories = {
    'filler': r'(filler)',
    'blender': r'(blender)',
    'labeler': r'(labeler)',
    'packer': r'(packer)',
    'palletizer': r'(palletizer)',
    'seamer': r'(seamer)',
    'rinser': r'(rinser)',
    'capper': r'(capper)',
    'tank': r'(tank)',
    'warmer': r'(warmer)',
    'pump': r'(pump)',
    'mixer': r'(mixer)',
    'conveyor': r'(conveyor)',
    'compressor': r'(compressor)',
    'boiler': r'(boiler)',
    'chiller': r'(chiller)'
}

# Categorize equipment
def categorize_equipment(description):
    for category, pattern in categories.items():
        if pd.notna(description) and pd.Series(description).str.contains(pattern, regex=True, case=False).any():
            return category
    return 'other'

data_swire_copy['Equipment_Type'] = data_swire_copy['EQUIPMENT_DESC'].apply(categorize_equipment)

# Calculate MTBF for each equipment type
data_swire_copy['MTBF_Days'] = data_swire_copy.groupby(['Equipment_Type', 'EQUIPMENT_ID'])['EXECUTION_START_DATE'].diff().dt.days

# Group by equipment type and calculate MTBF statistics
mtbf_stats = data_swire_copy.groupby('Equipment_Type')['MTBF_Days'].agg(
    Total_MTBF_Days='sum',
    Average_MTBF_Days='mean',
    Min_MTBF_Days='min',
    Max_MTBF_Days='max',
    Count='count'
).reset_index()

# Filter out the "other" category if necessary
mtbf_stats = mtbf_stats[mtbf_stats['Equipment_Type'] != 'other']

# Display the MTBF statistics
print("\nMTBF Statistics by Equipment Type:")
print(mtbf_stats)
MTBF Statistics by Equipment Type:
   Equipment_Type  Total_MTBF_Days  Average_MTBF_Days  Min_MTBF_Days  \
0         blender           2156.0          29.534247            0.0   
1          boiler           2337.0           9.348000            0.0   
2          capper           2160.0          30.000000            0.0   
3         chiller           4839.0          38.712000            0.0   
4      compressor           2103.0           9.064655            0.0   
5        conveyor          14962.0          67.094170            0.0   
6          filler          14121.0           1.694181            0.0   
7         labeler          28212.0           4.800408            0.0   
8           mixer          15368.0          24.393651            0.0   
10         packer          15184.0           5.338959            0.0   
11     palletizer          21412.0           8.568227            0.0   
12           pump          33068.0          34.445833            0.0   
13         rinser          21214.0           7.859948            0.0   
14         seamer          17749.0           5.336440            0.0   
15           tank          74663.0          40.467751            0.0   
16         warmer          17436.0          14.152597            0.0   

    Max_MTBF_Days  Count  
0           228.0     73  
1           447.0    250  
2           319.0     72  
3           354.0    125  
4           341.0    232  
5          1533.0    223  
6          1040.0   8335  
7          1757.0   5877  
8           770.0    630  
10          477.0   2844  
11          244.0   2499  
12         1936.0    960  
13          792.0   2699  
14         1250.0   3326  
15         2039.0   1845  
16         2027.0   1232  
In [ ]:
import pandas as pd
import numpy as np

# Load data
data_swire_copy = pd.read_csv("IWC_Work_Orders_Extract.csv")
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')
data_swire_copy['EQUIP_START_UP_DATE'] = pd.to_datetime(data_swire_copy['EQUIP_START_UP_DATE'], errors='coerce')
data_swire_copy = data_swire_copy.dropna(subset=['EXECUTION_START_DATE', 'EQUIP_START_UP_DATE'])

# Machine age and survival metrics
data_swire_copy['machine_age'] = (data_swire_copy['EXECUTION_START_DATE'] - data_swire_copy['EQUIP_START_UP_DATE']).dt.days

# Equipment survival statistics (from earlier analysis)
median_survival_times_dict = {'filler': 1256.0, 'packer': 1366.0, 'blender': 1122.0, 'blower': 1617.0, 'labeler': 1230.0}

# 1. Prioritize high-risk equipment
high_risk_equipment = ['blender', 'filler']  # Based on survival curves and unplanned maintenance

# Filter data for high-risk equipment
data_high_risk = data_swire_copy[data_swire_copy['EQUIPMENT_DESC'].str.contains('|'.join(high_risk_equipment), case=False, na=False)]

# 2. Count breakdown frequencies for high-risk equipment
breakdown_counts = data_high_risk['EQUIPMENT_DESC'].value_counts()

# Print breakdown frequencies
print("Breakdown frequencies for high-risk equipment:")
print(breakdown_counts)

# 3. Simulate inventory requirements for high-risk equipment
# Assume 1 spare part required per failure
inventory_requirements = breakdown_counts.copy()

# Add planned and preventive maintenance flags
data_high_risk['planned_or_preventive'] = (
    (data_high_risk['machine_age'] < median_survival_times_dict.get(data_high_risk['EQUIPMENT_DESC'].str.extract(r'(blender|filler)', expand=False).iloc[0], float('inf')))
)

# Simulate scenarios
no_inventory_downtime = breakdown_counts * 2  # Assume 2 hours of downtime per failure without inventory
sufficient_inventory_downtime = breakdown_counts * 0.5  # Assume 30 minutes downtime with inventory

# 4. Compare scenarios
inventory_impact = pd.DataFrame({
    "Equipment": breakdown_counts.index,
    "Failures (Count)": breakdown_counts.values,
    "Downtime Without Inventory (Hours)": no_inventory_downtime.values,
    "Downtime With Inventory (Hours)": sufficient_inventory_downtime.values,
    "Spares Needed": inventory_requirements.values
})

# Display results
print("\nStrategic Inventory Impact Analysis:")
print(inventory_impact)
Breakdown frequencies for high-risk equipment:
EQUIPMENT_DESC
L1 FILLER_ROTARY_CAN_72_VALVE         3745
L1 FILLER_ROTARY_BTL_60_VALVE         2738
L4 FILLER_ROTARY_CAN_100_VALVE        2511
L2 FILLER_ROTARY_BTL_45_VALVE         2072
L3 FILLER_ROTARY_BTL_100_VALVE        1474
L3 FILLER_ROTARY_CAN_124_VALVE         754
L2 LINE 2 FILLER                       354
MICROBLEND MICRO2 BLENDER (LINE 1)      44
L2 MICROBLEND MICRO2 BLENDER            43
MICROBLEND MICRO2 BLENDER (LINE 3)      31
APS - M1, FILLER DISCHARGE              16
L3 FILLER CIP SYSTEM                    16
FILLER ACCUMULATION TABLE               12
ALLIANCE FILLER DISCHARGE - M10          8
TANK_SS_STOR_FILLER_SUPPLY L3            6
TANK_SS_STOR_FILLER_RETURN L3            6
PUMP_CENT_SANIT_FILLER_SUPPLY L3         5
L10 FILLER_ROTARY_CAN_                   4
PUMP_CENT_SANIT_FILLER_RETURN L3         1
PUMP_CENT_SANIT_FILLER_SUPPLY            1
Name: count, dtype: int64

Strategic Inventory Impact Analysis:
                             Equipment  Failures (Count)  \
0        L1 FILLER_ROTARY_CAN_72_VALVE              3745   
1        L1 FILLER_ROTARY_BTL_60_VALVE              2738   
2       L4 FILLER_ROTARY_CAN_100_VALVE              2511   
3        L2 FILLER_ROTARY_BTL_45_VALVE              2072   
4       L3 FILLER_ROTARY_BTL_100_VALVE              1474   
5       L3 FILLER_ROTARY_CAN_124_VALVE               754   
6                     L2 LINE 2 FILLER               354   
7   MICROBLEND MICRO2 BLENDER (LINE 1)                44   
8         L2 MICROBLEND MICRO2 BLENDER                43   
9   MICROBLEND MICRO2 BLENDER (LINE 3)                31   
10          APS - M1, FILLER DISCHARGE                16   
11                L3 FILLER CIP SYSTEM                16   
12           FILLER ACCUMULATION TABLE                12   
13     ALLIANCE FILLER DISCHARGE - M10                 8   
14       TANK_SS_STOR_FILLER_SUPPLY L3                 6   
15       TANK_SS_STOR_FILLER_RETURN L3                 6   
16    PUMP_CENT_SANIT_FILLER_SUPPLY L3                 5   
17              L10 FILLER_ROTARY_CAN_                 4   
18    PUMP_CENT_SANIT_FILLER_RETURN L3                 1   
19       PUMP_CENT_SANIT_FILLER_SUPPLY                 1   

    Downtime Without Inventory (Hours)  Downtime With Inventory (Hours)  \
0                                 7490                           1872.5   
1                                 5476                           1369.0   
2                                 5022                           1255.5   
3                                 4144                           1036.0   
4                                 2948                            737.0   
5                                 1508                            377.0   
6                                  708                            177.0   
7                                   88                             22.0   
8                                   86                             21.5   
9                                   62                             15.5   
10                                  32                              8.0   
11                                  32                              8.0   
12                                  24                              6.0   
13                                  16                              4.0   
14                                  12                              3.0   
15                                  12                              3.0   
16                                  10                              2.5   
17                                   8                              2.0   
18                                   2                              0.5   
19                                   2                              0.5   

    Spares Needed  
0            3745  
1            2738  
2            2511  
3            2072  
4            1474  
5             754  
6             354  
7              44  
8              43  
9              31  
10             16  
11             16  
12             12  
13              8  
14              6  
15              6  
16              5  
17              4  
18              1  
19              1  
In [ ]:
import pandas as pd

# Convert execution start date to datetime and drop rows with invalid dates
data_swire_copy['EXECUTION_START_DATE'] = pd.to_datetime(data_swire_copy['EXECUTION_START_DATE'], errors='coerce')


# Sort data by machine and by date
data_swire_copy = data_swire_copy.sort_values(by=['EQUIPMENT_ID', 'EXECUTION_START_DATE'])

# Calculate the time difference (in days) between consecutive failures for each machine
data_swire_copy['MTBF_Days'] = data_swire_copy.groupby('EQUIPMENT_ID')['EXECUTION_START_DATE'].diff().dt.days

# Calculate breakdown count for each machine
breakdown_count = data_swire_copy['EQUIPMENT_ID'].value_counts()

# Define risk thresholds
# High breakdown count threshold (top 10%)
breakdown_threshold = breakdown_count.quantile(0.9)

# Short MTBF threshold (bottom 10% of all MTBF values)
mtbf_threshold = data_swire_copy['MTBF_Days'].quantile(0.1)

# Create a risk assessment table
risk_table = pd.DataFrame({
    'Breakdown_Count': breakdown_count,
    'Original_MTBF_Days': data_swire_copy.groupby('EQUIPMENT_ID')['MTBF_Days'].apply(list)
})

# Determine risk status based on thresholds
risk_table['Risk_Status'] = risk_table.apply(
    lambda row: 'At Risk' if (row['Breakdown_Count'] >= breakdown_threshold) or
                                (any(mtbf <= mtbf_threshold for mtbf in row['Original_MTBF_Days'] if pd.notna(mtbf))) else 'Low Risk', axis=1
)

# Filter for at-risk equipment
at_risk_equipments = risk_table[risk_table['Risk_Status'] == 'At Risk']

# Sort the at-risk equipment by Breakdown Count in descending order
# and Original MTBF (shortest first)
at_risk_equipments = at_risk_equipments.sort_values(
    by=['Breakdown_Count'],
    ascending=[False]
)

# Get the top 10 at-risk equipment
top_10_at_risk_equipments = at_risk_equipments.head(10)

# Reset index for a clean display
top_10_at_risk_equipments.reset_index().rename(columns={'index': 'Equipment_ID'}, inplace=True)

# Display the top 10 at-risk equipment
print(top_10_at_risk_equipments)
              Breakdown_Count  \
EQUIPMENT_ID                    
300115000.0              3745   
300025792.0              3311   
300017655.0              3060   
300001113.0              2738   
300026891.0              2222   
300017654.0              2124   
300009199.0              1909   
300001078.0              1853   
300017247.0              1712   
300009177.0              1664   

                                             Original_MTBF_Days Risk_Status  
EQUIPMENT_ID                                                                 
300115000.0   [nan, 1.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, ...     At Risk  
300025792.0   [nan, 1.0, 2.0, 0.0, 5.0, 2.0, 0.0, 0.0, 0.0, ...     At Risk  
300017655.0   [nan, 0.0, 1.0, 2.0, 0.0, 0.0, 1.0, 2.0, 1.0, ...     At Risk  
300001113.0   [nan, 2.0, 1.0, 0.0, 1.0, 8.0, 5.0, 1.0, 0.0, ...     At Risk  
300026891.0   [nan, 2.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, ...     At Risk  
300017654.0   [nan, 0.0, 2.0, 3.0, 0.0, 2.0, 14.0, 0.0, 2.0,...     At Risk  
300009199.0   [nan, 1.0, 5.0, 0.0, 3.0, 0.0, 1.0, 0.0, 1.0, ...     At Risk  
300001078.0   [nan, 0.0, 0.0, 5.0, 2.0, 0.0, 1.0, 1.0, 4.0, ...     At Risk  
300017247.0   [nan, 0.0, 2.0, 1.0, 4.0, 2.0, 2.0, 2.0, 1.0, ...     At Risk  
300009177.0   [nan, 4.0, 3.0, 0.0, 2.0, 0.0, 0.0, 1.0, 0.0, ...     At Risk  

12. Results from Modeling ¶

  • High-Cost Functional Areas and Equipment: Within Swire Coca-Cola’s production, certain functional areas experience particularly high failure rates and shorter survival times, necessitating careful prioritization of maintenance resources. Key areas such as Monza Production and Cota Production exhibit substantial wear, with median survival times of 1,386 days and 1,143 days, respectively. Stocking essential spare parts for these areas, such as conveyor belts, labelers, and fillers, can significantly reduce downtime. Other production nodes, like Suzuka Production with a median survival time of 1,515 days and Silverstone Production at 1,242 days, would similarly benefit from an inventory of critical components. By focusing on these high-impact areas, Swire Coca-Cola can ensure smoother operations with fewer interruptions.

  • Critical Equipment and Parts by Functional Node: In Functional Area Node 1, Production nodes like COTA, MONZA, and SILVERSTONE face high failure rates and survival times that range from 767 days in Production to 1,515 days in Suzuka. Ensuring an inventory of essential parts like rotary fillers, PET bottle fillers, and conveyors will prevent extended downtimes in these frequently maintained nodes. In Functional Area Node 2, key lines such as CAN LINE and BTL_PET_LINE, with survival times of 1,095 days and 1,359 days respectively, require readily available spare parts for rotary can fillers and bottle fillers to maintain continuity. For PET Bottle Lines like L1, L2, and L6, which have a longer survival time of 1,494 days, stocking spare filling heads and sealing mechanisms is essential to reduce repair delays. Functional Area Node 3 equipment, such as FILLER_ROTARY_CAN and BOTTLE_LINE_PET, with median survival times between 1,300 and 1,365 days, would also benefit from preemptive parts inventory. In Functional Area Node 4, critical components like FILLER (1,177 days), PACKER (1,193 days), CONVEYOR (1,139 days), and LABELER (1,524 days) experience regular maintenance needs, making a well-maintained spare parts inventory crucial. Additionally, for Functional Area Node 5, equipment like Packer 1, Model Duodozen 1250SX, with a short survival time of 176 days, and conveyors such as Full Case Conveyors (745 days) and Empty Case Conveyors (711 days) would greatly benefit from readily available replacement parts.

  • Seasonal and Monthly Downtime Costs: The analysis of seasonal and monthly downtime costs highlights critical periods where equipment failures are costlier and more disruptive. Spring and Summer are the most demanding seasons, with downtime costs peaking during these months, underscoring the need for increased maintenance staff and spare parts during this period to minimize disruptions. Monthly trends indicate that March and May are especially challenging, suggesting a proactive strategy where Swire Coca-Cola can stock critical parts in advance and prepare maintenance teams to handle increased workloads. By aligning maintenance efforts with these peak periods, Swire Coca-Cola can improve operational reliability during high-demand seasons.

  • Detailed Recommendations with Numerical Findings: Based on survival time analysis, prioritizing inventory for high-risk equipment is essential for reducing downtime. Functional areas like Monza Production, with a median survival time of 1,386 days, and Cota Production, at 1,143 days, would benefit from maintaining spare parts such as fillers, conveyors, and labelers to avoid lengthy disruptions. In Functional Area Node 2, CAN LINE and BTL_PET_LINE, with survival times of 1,095 days and 1,359 days respectively, should be prioritized for inventory of critical parts, particularly rotary can fillers and bottle fillers, to mitigate downtime risks. Additionally, implementing preventive maintenance based on median survival times will ensure high operational efficiency. For equipment with survival times below 1,000 days, such as Full Case Conveyors (745 days) and Empty Case Conveyors (711 days), frequent preventive maintenance and stocked spare parts will effectively minimize disruptions. Furthermore, equipment with very short survival times, like Packer 1, Model Duodozen 1250SX (176 days), should be subject to routine preventive checks to prevent unexpected stoppages.

  • Use Median Survival Times to Guide Inventory Planning: For assets with high median survival times, like Machine S250 with 2,176 days and Product Line L08 with 1,637 days, a strategic inventory approach can help manage costs while ensuring availability. Conversely, equipment with shorter survival times, such as Packer 1 and Full Case Conveyors, should have their parts consistently stocked due to their frequent maintenance needs. By adopting a data-informed approach to inventory planning based on survival times, Swire Coca-Cola can achieve greater efficiency, reduce downtimes, and support uninterrupted production lines.

13. Summary and Predictions ¶

  • Seasonal Maintenance Planning: By strategically scheduling preventive maintenance during the less demanding Winter and Fall seasons, Swire Coca-Cola can keep its equipment in prime condition before the high-demand Spring and Summer months. This proactive approach prepares machines for peak operation periods, reducing the likelihood of unexpected breakdowns when production is critical.

  • Focus on High-Cost Nodes: Prioritizing high-cost nodes like “Monza Production” and “COTA Production” can bring immediate value by addressing areas with the greatest impact on productivity. By allocating extra maintenance resources to these key areas and implementing rapid-response maintenance solutions, Swire Coca-Cola can significantly reduce repair times and downtime costs where they matter most. This targeted approach ensures that critical production lines remain operational, minimizing disruptions in the most essential areas of the plant.

  • Inventory of Critical Spare Parts: Keeping an inventory of essential spare parts for high-wear equipment, particularly in critical areas like "FILLER," “PACKER,” and “CONVEYOR,” is vital for maintaining continuous production. By ensuring that replacement parts are readily available, Swire Coca-Cola can significantly reduce downtime associated with equipment failures, as repairs can be conducted without delay. Having these critical parts on hand ultimately reinforces the efficiency of maintenance activities and supports uninterrupted productivity in essential production nodes. This proactive approach speeds up recovery times and adds resilience to operations, preventing prolonged disruptions in the production flow.

  • Maintenance Duration Reduction: Streamlining repair processes with pre-assembled maintenance kits and dedicated maintenance staff can significantly reduce repair times, especially in high-cost nodes. By reducing the duration of maintenance activities, Swire Coca-Cola can lessen the financial impact of downtime, allowing for a quicker return to full operational capacity and minimizing disruptions in production. This targeted approach to efficiency reduces costs and boosts overall productivity in critical areas.

  • Emphasizing Preventive Over Corrective Maintenance: Shifting focus towards preventive maintenance can significantly enhance equipment longevity and reduce unexpected breakdowns. This proactive approach has proven more effective in extending equipment life than reactive, corrective maintenance. By incorporating preventive practices, Swire Coca-Cola can maintain a stable production flow, decrease repair frequency, and lower operational disruptions.

  • Monthly and Seasonal Downtime Cost Management: To optimize cost and production efficiency, scheduling preventive maintenance in lower-cost months like September and October is essential. This strategy avoids high-demand, high-cost months such as June,July,May,August where downtimes could result in amplified financial losses. Aligning maintenance schedules with seasonal production demands allows Swire Coca-Cola to mitigate downtime impacts during peak operational periods.

By implementing these targeted maintenance strategies, Swire Coca-Cola can significantly reduce unplanned downtimes, saving up to 10.7 million cases annually in maintenance-related losses. With downtime impacting 5.6% of the total production capacity, these improvements will help Swire Coca-Cola achieve closer to 100% output. This structured approach enhances productivity and optimizes maintenance spending, ensuring more consistent operational uptime and increased production efficiency. Ultimately, these changes will lead to reduced costs, improved profitability, and a stronger competitive position in the market.

14. Individual Contributions ¶

Name Contribution's
Charith
  • Broke down of the dataset to better understand the dataset and case objective.
  • Created analysis that visually supports our understanding of the dataset.
  • Performed seasonal decomposition on maintenance data to extract trends, seasonal, and residual components.
  • Performed modeling techniques like Kaplan-Meier on key packaging lines.
  • Analyzed how the proactive maintenance strategies are reducing the downtime
  • Created visualizations and meticulously worked on the document's organization and readability.
  • Performed document proofreading, ensuring accuracy and clarity throughout the document.
Navigate to top